My wife was a champion table tennis player. This sport uses Elo as well, and I know from watching the sport over time that the rating system has real problems. It doesn't suffer from the weaknesses that you cite, but even so, the problem of "rating inflation" is widely discussed.
It seems that much of the problem comes from rating points brought in by newbie players (and note that, contra TFA, the problem isn't with experienced players losing to newbies, but the opposite).
A newbie is started off with some nominal rating; I forget the number, but let's say it's 800. Most likely that newbie is going to lose his first matches, and some proportion of those newbies will get frustrated and quit. For the ones that stay in the game, things probably work out in the long run. But for those that got discouraged and quit, in the course of their loss they caused a few points (not many, because they're likely way overmatched, but definitely more than 0) to be credited to their opponents. When they quit the sport, they're never going to reclaim any of the rating points that they lost initially. But those points are still in the system, having been added to their winning opponents.
It's hard to quantify because the Elo system is the only objective comparison we have, but over the course of the almost 30 years I've been watching my wife play, the Elo rating enjoyed by a player of a given hypothetical skill level has increased dramatically. Many are saying that for someone of the upper echelons, their rating is maybe 200 points higher than it would have been 30 years ago.
So back in 1991, my wife was in the top 30 women in the USA with a rating in the mid-1700s. Today, someone with that rating isn't even going to be in the top brackets of serious tournament.
Despite all that, the usefulness of the rating system keeps it in use as a valuable tool. It seems that the ability to match players who have never seen each other before, ensuring interesting matches, is part of keeping the game competitive for those in it. And table tennis is also, because of this, one of what I believe is few sports where men and women often play head-to-head (even though men generally have much higher ratings, on account of the sport requiring far more strength than you might suspect).
I don't think there's an expectation that a skill rating is comparable throughout 20 years, because both individual players and how the game is played (the meta) changes continuously.
But if that's true, then why would rating inflation be a problem?
The game itself has not changed, so it still makes sense to compare players across time. It would be nice if we had a quantitative way of doing this; so we can make statements like 'the average proffessional player today is better than 20 years ago, a typical modern pro would win 60% of the time again one from 20 years ago).
In some sense, it is not surprising that we do not have a system that accomplishes this. Since it is impossible to see the results of a game between players living in different time periods, we cannot get any data to prevent drift. You can still try to normalize the rankings. However, unless you have some independent way of measuring skill, you would need to make an assumption about the relative strength of players. Assuming the average skill of a proffesional is constant across time is probably not accurate, but closer to reality than what you get with unchecked inflation.
You can sort of solve the inflation problem by zscoring the elo. Now a person's score will tell you how much better or worse they are than the median player, assuming an underlying normal distribution (reasonable).
Of course, scores will only be comparable if the average skill of all players remain constant. I would imagine this isn't true, but the drift over several decades is probably small.
Unless you start introducing some purely objective criteria for skill, which can never work, this is the best you can do. It's still way way better than a straight elo system though.
Rating distributions are often not normal because some subset of players study the game and take it more seriously resulting in a bimodal distribution. See [0] for an example in Chess.
Even without the bimodality, you wouldn't expect a normal distribution of ratings.
1. Assume that chess ability is normally distributed in the population.
2. Assume that people who are terrible at chess are more likely to stop playing chess than people who are successful.
Then you've sampled the underlying normal distribution mostly from the top end, and that new, highly skewed distribution is what you'll see when you measure everyone's rating.
The idea that chess has not changed in a long time is simply not true. Two huge and relatively recent changes were the addition of chess clocks and premoves.
And aside from the mechanics of how the game is played, there have been massive changes in the popularity of chess (first massively upwards, recently possibly down slightly), as well as how analyses are done.
It would be very difficult to account for these factors in a way that keeps comparisons across 30-year+ time spans meaningful.
This might not be great for a sporty-sport, but I think that for a video game this would actually be an advantage. This kind of a rating inflation would mean that long-term players would see some numerical progress without really doing much better.
A newbie is started off with some nominal rating; I forget the number, but let's say it's 800. Most likely that newbie is going to lose his first matches, and some proportion of those newbies will get frustrated and quit
That seems like a simple problem to fix. When somebody quits, just subtract 800 points from the remaining ranked players, scaled accordingly such that their relative win probabilities remain the same.
Of course, the other issue is if the number of active players increases over time. In that case, it's not so easy to fix unless you start scaling down the number of starting points given to new players.
Perhaps a better thing to do would be to construct a model of the rating inflation over time and use that to correct for historical comparisons. It's still not particularly meaningful though, because you have no way to measure actual skill inflation.
You don't have to formally quit the game to stop playing. I played one ranked chess tournament in high school, quit for ten years, and then picked it back up. What would you do with my points?
If you choose to delete them, that means that everyone will have constantly eroding ratings unless they keep playing.
> It doesn't suffer from the weaknesses that you cite, but even so, the problem of "rating inflation" is widely discussed.
Ah yes! Inflation is also a problem I've seen in competitive online games. Rating inflation was a serious issue with World of Warcraft PvP arenas circa 10 years ago (iirc Blizzard hard capped arena ratings at 3000 during WotLK). I don't follow chess much, and I'm not exactly sure how chess avoids it (or even if it does).
By the point you're playing ranked matches in chess, you're generally invested enough to keep playing. However, chess has a (statistically) significant inflation problem, to the point where you can only compare scores within the same decade or so meaningfully.
It seems there was a lot of rating inflation in chess, but at the top level, at least, it's stopped - the number of players over 2700 has been pretty constant for 5-10 years, a few dozen players. In 1990, only Kasparov and Karpov were rated over 2700.
There's also an inherent deflation effect. Players tend to get better over time. In the simplest case, if we start with a pool of players rated 800 and let them play for a year, at the end they'll be better players but still rated 800 on average.
Most chess Elo systems have an inflationary component where young or new players (who are overall faster improvers than the player pool at large) gain and lose points faster than established players (in detail, either using performance ratings or increased k-factors or both). In a balanced rating system, the sources of inflation and deflation are roughly equal. You can tweak the parameters to keep it this way, though it's not trivial to tell whether there is "real" inflation over the years or whether players are simply playing better - or indeed, what's the difference.
Why don't they increase the bar for newbies to get into such a system?
If they know that some people just play a few games and then quit, let's say they only can get Elo when they played a specific amount of time or won at least n games etc.
There is a minimum of 10 games before people start being ranked. People who quit early don't get ranked. People who have played 10 games gain a new long-term goal.
It seems that much of the problem comes from rating points brought in by newbie players (and note that, contra TFA, the problem isn't with experienced players losing to newbies, but the opposite).
A newbie is started off with some nominal rating; I forget the number, but let's say it's 800. Most likely that newbie is going to lose his first matches, and some proportion of those newbies will get frustrated and quit. For the ones that stay in the game, things probably work out in the long run. But for those that got discouraged and quit, in the course of their loss they caused a few points (not many, because they're likely way overmatched, but definitely more than 0) to be credited to their opponents. When they quit the sport, they're never going to reclaim any of the rating points that they lost initially. But those points are still in the system, having been added to their winning opponents.
It's hard to quantify because the Elo system is the only objective comparison we have, but over the course of the almost 30 years I've been watching my wife play, the Elo rating enjoyed by a player of a given hypothetical skill level has increased dramatically. Many are saying that for someone of the upper echelons, their rating is maybe 200 points higher than it would have been 30 years ago.
So back in 1991, my wife was in the top 30 women in the USA with a rating in the mid-1700s. Today, someone with that rating isn't even going to be in the top brackets of serious tournament.
Despite all that, the usefulness of the rating system keeps it in use as a valuable tool. It seems that the ability to match players who have never seen each other before, ensuring interesting matches, is part of keeping the game competitive for those in it. And table tennis is also, because of this, one of what I believe is few sports where men and women often play head-to-head (even though men generally have much higher ratings, on account of the sport requiring far more strength than you might suspect).