Battle Reports

Rating Players: Trials and Tribulations

published on

Ahoy, commander! Now that we've got some improvements on creating content on iToysoldiers I figured I'd be time to start working on improving the stats and metrics around your miniature wargaming battles.  Specifically, I'm working on rating players based upon their performance. In short, I'm implementing a modified variation of the ELO ranking system and I'm going to be applying it pretty much everywhere average battle score is used.

Average Battle Score Sucks

Let me talk a little bit about why this change to an ELO based player rating is important.  Right now, iToysoldiers uses your average battle rating to determine your position in various leaderboards.  This isn't a bad solution because it does show your relative standing as it relates to other players.  It's also pretty handy for determining a player's relative strength in relation to assorted meta items like the mission being played.

The HUGE problem with this system is that it really assumes each player has a similar number of reported games.  So players who've played lots of games will have a reasonably accurate score and folks who've played a small number of games will have a ridiculously inaccurate score.

For example: If I've played three games and they're all wins my average battle score would be 5.  If my buddy Tyler has played 400 games and they're all wins then his score would also be 5.  If he lost one of those games it'd put him beneath me in the rankings.  Who's really the better player?  Do we know yet?  Should my score, with only three games, be compared against Tyler's 400?  Is there a comparison to be made?  Maybe.  Maybe not.  But unless we're looking at a similar number of games, Average Battle score doesn't really tell us much.  In fact, it punishes players for reporting more battles and that's something that I certainly don't want to have happen.

Rating Players using Chess as a Model

Enter ELO as a model for determining a player's rating.  The premise behind ELO is that with a little fun with math a player's rating can be determined by past performance and modified by the results of a match between those two players.  The resulting rating is normalized to a value that can be used to compare players whether they've played 400 games or 3 games.  (Yes, yes.  I know...  The results will still be more accurate the more games that are played but the concept is sound).

What I'm working on now is implementing a modified ELO system into iToysoldiers so that it'll be possible to get reasonably accurate ratings of players overall, and possibly (still working on this) under specific sets of battle meta. I'm aware of some of the draw backs of the ELO model and how other rating systems are possibly more accurate because they take a couple of other factors into account - things like confidence level of the current rating and stuff like that.  The biggest issue, to my mind, is that ELO (and Gliko - another player rating system - for that matter) assume that both sides are equal.  Example: In chess, both players have exactly the same pieces and the same "abilities".  In miniature wargaming that's not the case.  Factions have varying powers that contribute to the power of the player.  I'm aware of this and hence the modifications.

Keeping me up at night

Modifying the ELO player rating calculations to consider elements of miniature wargaming into account has been what's really making my head spin the last week or so.  A couple of things have been weighing heavily into how this is going to work:

  • How will a player's faction affect their player rating?
  • Some players won't have their opponent on iToysoldiers so that a comparison score can be utilized in the calculation.
  • What are the definitions of a "provisional player"?
  • What happens when players are playing narrative battles where both sides aren't necessarily considered even?
  • Do people care about their rating on a faction by faction (or game by game) level or is their over all rating acceptable?

Faction Impact on Rating Players

The biggie? How factions can or should impact rating players. I've decided on two factors:

First, if a player's game is not acknowledged by another player on iToysoldiers then the rating change will be based upon the current rating of the opposing faction.  Yep.  That means that I'll be generating ratings for the various factions of the game.  There's a couple of cool things I'll be able to do with that info.  The second is a bit more... dicey, maybe?

So within ELO there's the concept called, "K". K is a constant that determines how much a match matters when rating the players.  It is basically, what's the biggest change in ratings that can happen.  In chess, the K value is only based upon a player's ranking and how many games they've played.  I'm adding in another factor: The relative strength of the factions.  Example: In 40K Orks have had a rough go.  Their codex hasn't been considered to be terribly competitive.  Craftworld Eldar, on the other hand, are generally considered to be pretty top tier.  Gotta tell you, the test runs of the data I have pretty much bear this out.  It makes sense to me that someone who's playing Orks against Eldar shouldn't loose as many points if they're defeated by Eldar.  Everyone saw that coming, right? On the flip side, the Eldar player shouldn't get quite as many points for defeating an Ork player 'cause their faction is superior.  I'm incorporating that concept into the ratings.

A nifty side effect is that standard deviation of a game system's factions will also give an indicator as to the balance of a particular game.  I'm pretty excited about being able to provide that kind of info.


And that's where my head's at. iToysoldiers will be rating players by the end of September at the latest.  Have some thoughts on the matter? Feedback? Comment here.  Email me.  Visit the iTS Support Portal Entry for Player Ratings.  I aim to please and a thrive on your commentary.

Carpe Acies!
Rob @ iToysoldiers