The closing part of the Burszteins’ presentation is dedicated to modeling and evaluating card affinities in Hearthstone for accurate game outcome prediction.
Before you get all your hopes too high and I tell you about how black magic is done, just a word of disclaimer. Because Naxxramas has just been released, the meta is quickly shifting, so the predictions are not accurate. As I said, we don’t have access to a lot of games, so expect the tool to not perform very well for the next few weeks. That being said, it will catch up as soon as the cards that have been released get stabilized.So, how does it work? Well, it’s very simple (see right-hand image). We model card affinities, that is, which cards played with each other. Then we have an evaluation function which will return what are the most likely affinities to be played based on the previous cards played by the opponent. And then the tool goes through a bunch of replays and learns from those replays the card affinities and other metrics. And then, as Celine showed you, we have the in-game tool which, based on the algorithm, analyzes the cards that have been played and returns the most likely card to appear. That’s basically what it looks like. We are going to see each step at a time, and that will be the final part of the talk.
So, card affinities. As I said, Hearthstone has about 500 cards. So if you want to look at all the card combinations, it will be almost impossible. For 30 cards in the deck, we have close to an impossible number of combinations. So what we really do is we exploit the fact that in practice some cards really work well together, and some cards do not work well together. For example, if you play Druid you have a combo, and the combo is Savage Roar plus Force of Nature. So if you have one, it’s likely you have the other. On the other hand, you do not have Force of Nature and, I don’t know, a Murloc – doesn’t make any sense, this is not an affinity that you need to model.We use the simplest thing you can think of, which is called n-gram. Here is one of the simplest versions called bigram (see right-hand image). What we look at is we look at the replay as the sequence of cards which have been played by the opponent. So we say, well, the Armorsmith has been played and then the guy played the Taskmaster. That’s one bigram, and we record the affinity between the two. We say, well, if you play the Armorsmith, then you are likely to play the Taskmaster. And then we look at the second part of the stream, which is the Taskmaster followed by the Acolyte of Pain, so we record that there’s also affinity between those two cards. That being said, we know that cards are drawn at random. You get one card at each turn, so it’s at random. To model that and account for that, we use what we call un-ordered n-gram (see right-hand image). And here are un-ordered bigrams. If you play Armorsmith sometime during the game, then later on you are likely to play Taskmaster. If you play the Acolyte of Pain, then you are likely to also play Armorsmith. We also tried trigrams, which is three pairs of cards, and so forth. It turns out that the best model is the bigram. So, how do we evaluate card affinities? Again, very simple (see right-hand image). The opponent played, let’s say, Deadly Poison and then also Shiv, which are two cards. Then, from there we say, well, what is the affinity of those cards? For Deadly Poison we know that Fan of Knives has been seen 500 times and Blade Flurry has been seen 350 times, so we look at those. And for Shiv we know that Blade Flurry has been seen 400 times, and when you played Shiv we also saw Amani Berserker 400 times. So we combine them and, again, very simple, the simplest strategy is to just get the sum. We say: “How many times did those cards appear?” And we get the ranking. And the ranking is: Blade Flurry is on top because it’s 350 plus 400. It’s followed by the Fan of Knives, it’s 500. And my third prediction didn’t make it, this guy is Amani. That’s how it works. It seems to be actually very simple. Turned out if you do something more complicated – it won’t work. We have tried more things, and this one actually works really well. So, how did we do it? We took 50,000 replays and we did one model per class, because each class has some unique cards and we didn’t want to have cross stream. We ran the code and we learned the thing (see left-hand image). Actually the algorithm ran the 50,000 replays in less than 3 minutes. It’s not that much work. And then – victory! I was actually shocked at how good the thing was. By turn 3, the highest prediction has a 97% chance to be played, which means that out of 100 games the first prediction that the algorithm returns at turn 3 will be played. If you want to look a little bit deeper, this is a curve of the average prediction of the algorithm for 10 predictions turn by turn (see right-hand image). You can see it actually rise, because it gets more and more information as turns are played. The more cards your opponent plays the more we can look at affinities and the accuracy increases. And then it starts to decrease after turn 8 because there are fewer and fewer cards in the deck of the opponent, so you have less and less chance to be right. So the balance is somewhere between turn 4 and turn 8. If you look at the ranking functions, the ranking functions do work (see right-hand image). Green represents our best prediction, the orange one represents our lowest prediction, which is at number 10. And you can clearly see that the best one is actually extremely good, up in the high 90s starting with turn 3, whereas the lowest one is barely above 20. And they are converging because the algorithm makes fewer and fewer mistakes but there is also less and less room for errors. So, that wasn’t all we wanted to tell you guys. I wish we had more time to tell you more (see right-hand image). We wanted to tell you about predicting the game outcome – again, we’ll do a blog post. We also are looking into optimizing deck for mana-throughput because we know that mana advantage is the key factor to win. And also, by popular requests, we got requests for looking at hero power comparison and also comparing various types of decks. These are the things we are looking forward to do. If you have ideas about things you would like us to do, or if you have ideas about things we should do together, please tell us, we’d love to do it.
I would like to finish by saying thanks to the people who give us feedback – thanks a lot! We do read a lot of comments that people post to us. It’s really important because it helps us get better. In particular, we’d like to thank Neil and Zach who spent a lot of time helping us prepare for this talk and also giving us insightful feedback. And also thanks to our anonymous friend who gave us the replay data. Thanks a lot!
Read previous: I am a legend 5: Predicting the opponent deck