Battle of the Sexes: Rating Changes

Siming won the match by winning the first two sets with scores of 21 to 19 and 21 to 20.

What does this do to the ratings?

We are often asked referring to Fargo Rating changes, what is the formula? . And somethings when people don't see or can't find a formula, they suspect we are being secretive about how rating changes are calculated. We are not. Turns out there is no formula. Computing rating changes following even a single matchup like this is a major exercise that uses a bank of dozens of computers in the cloud. The complication is that every player's rating depends on every other player's rating. So if, tentatively, my rating changes a bit. Then every other player's rating changes a bit in response. But if every else's rating changes a bit, mine is no longer right. So mine changes and then every other rating changes again. This happens until all ratings are consistent and are optimum. Still, even without a formula there are some things we can understand about rating changes.

The changes

Here are some rating changes following the 81 games played between Donny and Siming. In particular you can see Donny goes up 1.5 points (to 750) and Siming goes down 2.9 points (to 780). These results suggest several questions?

Whose rating went up?

The first thing to note may be surprising to some. Despite winning the match, Siming Chen's rating went down. And despite losing the match, Donny Mills's rating went up. The reason for this is we are always compared to an expectation based on our current rating. Because Siming was 35 or so points above Donny, she was expected not only to win but was expected to win by a larger margin than she did. She fell short of this expectation, and Donny exceeded the expectation set by his rating.

Why is Siming's rating change bigger than Donny's?

The magnitude of Siming's rating change, 2.9 points, is almost twice Donny's 1.5 point change. There are multiple reasons for this, and it is not easy to know their relative importance.

How many games?

The first thing to look at is each player's robustness--the number of games their rating is based on. A higher robustness in general goes along with a rating that is more reliable and less sensitive to new information. Here, both players have about 2300 games in the system, so that's not a big contributor.

How recent are the games?

The games in a player's record do not carry equal value. One example is more recent games carry more weight. It turns out, once again, these two players are similar on this measure. For both the most recent 750 games go back to Fall 2017, for instance.

How well known are the opponents?

Games against unknown opponents carry no information at all, and games against barely established opponents carry less information than do games against well established opponents. Here is where there is some difference. While both players have that vast majority of their games against opponents with established ratings, Siming has nearly all her games against women and many of those against Asian women. Donny's games, in contrast, are against a more diverse crowd.

Ratings of other top women?

A common question is how can a player's rating change when the player hasn't played. Here is a good example. Take Sha Sha Liu, for instance. Her rating is based upon play against other women and largely other Asian women. So when Siming goes down a couple points, that has a small effect on Sha Sha. Suddenly Sha Sha gets a bit less credit for the games she has won against Siming and is forgiven a little less for the games she lost against Siming. That by itself would be less than a tenth of a point. But it is a compounding issue. Other top women take this same hit. Then if all the top women are down a smidge, Siming's rating goes down a bit further for the same reason. The bleeding stops when everything is optimum, when an equilibrium is reached. Here you can see that Asian women dropped by about a point, and this is a notable part of the reason Siming dropped more than Donny was raised. This is basically a small tide shift for Asian women.

What about Karen Corr?

Notice that Karen Corr is largely immune to this tide shift. Her rating dropped by a small fraction of a point. This is because Karen has a much more diverse opponent pool than do the other top women players. In fact you can get a sense by looking at the other top women here which ones are coupled more diversely; they are the ones who experienced less than the full drop of 1 point or so.

Is this an "Ah ha! the beginning of the big rebalancing" situation?

No. There is no reason to think so. This kind of rebalancing happens all the time, and after the next notable piece of new information, there is just as likely to be a shift half a point down as half a point up. In fact these are the kinds of tide shifts that happen every day when players from Alaska get games against players from Arizona or players from Quebec get games against players from Florida. As more and more data couple different groups, these tide shifts get smaller and smaller.