I am a legend: Hacking Hearthstone with machine learning

0
7

Elie Bursztein At Defcon 22, Elie Bursztein and his wife Celine present their research about finding undervalued cards in Hearthstone and predicting the opponent’s actions.

Hi everyone! My name is Elie. Today Celine and I are going to talk to you about Blizzard’s new game Hearthstone. Just a quick disclaimer before we get started: this is our personal work, and it does not represent in any way our employer’s point of view. That being said, if you are looking for a new job Google is hiring, so if you’re interested – come talk to me after the talk. Hard to do it, guys… And then, with that out of the way, let’s get started.

A really captivating game
A really captivating game
So, Hearthstone is a digital collectible card game released by Blizzard earlier this year. It is based on the universe of World of Warcraft, and it’s an amazingly addictive game. With everything which is too interesting, sometimes unintended consequences happen. At some point during the last few months, we started to be more and more interested in how the game is structured, how we can understand it better and build a tool for it rather than playing it. I remember from May to June I think I played only up to level 5. But it’s an amazing game. If you haven’t tried it, you should. It’s free to play. It’s available on your computer – Mac and PC – and also has a mobile version for iPad, and hopefully anytime soon for Android.

Looking at this, it’s also a very good excuse for me because I finally convinced my wife to come on stage with me and do it for a Defcon talk. That’s something I really wanted for a long time. That’s her first talk, so please be nice with her.

Shortcoming of complexity
Shortcoming of complexity
Alright, what are we talking about today? We are going to talk about game complexity. Hearthstone has about 500 cards now, and complexity creates bias. And the question is: can we use these biases? Can we exploit them? That’s what this talk is about. We wrote about our research to Blizzard – I know some of them are in the room – but we didn’t get any response, so I think they are fine with it. No answer means “Yes”, right?

What is the research about?
What is the research about?
So, the first thing we want to tell you about is how to find undervalued cards. Each card has a value, and the question is: are they more bang for the buck than others? The second thing we want to tell you is: can you predict what your opponent is going to play? Yes, it’s possible, and we are going to show you how, to a certain extent. We also wanted to tell you about how to predict the game outcome. It’s very interesting, except we don’t have time – 45 minutes is too little. And of course we wanted to tell you about incoming alien invasion. No, wait; not this one.

The interface for Hearthstone
The interface for Hearthstone
How many of you have played Hearthstone? Can you please raise your hand? Not everyone. Okay. How many of you reached legend? No one. Okay. Sorry. That being said, for those who have never seen Hearthstone, here is what a normal game looks like (see right-hand image). It’s a board game. It’s a turn by turn; it’s two players. Each person is represented as a hero, which is inspired by World of Warcraft characters. You are at the bottom; here I’m playing Valeera, which is a Rogue hero. And my opponent is playing Heigan, which is a solo player only here.

The game ends when health pool of one of the two heroes reaches zero. The health pool is on the right side of your portrait (see leftmost image below). So, how do you kill people? Well, you use decks (see middle image below). And the decks contain cards: 30 cards for each player. It’s a hard limit, you can do only 30, not 29, not 31 – just 30. Each card has special effect. Every time you draw a card and the card goes into your hand – you can see one of the cards is highlighted in green (see rightmost image below); this means you have enough mana to play it. Mana is the resource you use to play cards.

Cards in your hand
Cards in your hand
The decks
The decks
Health pool indication
Health pool indication
 

Your opponent also has a hand (see leftmost image below). You only see the back of it, obviously, because you don’t want to see people’s hands. For those who are curious: no, the game does not know the card ahead of time; and no, you can’t cheat with that, I checked. The mana pool that I mentioned is depicted on the right side (see middle image below). It goes from 1 at the first turn – up to 10. After turn 10 it can be replenished but does not increase. So the maximum amount of mana you can have is 10.

Minions in Hearthstone
Minions in Hearthstone
Mana pool
Mana pool
Opponent's hand
Opponent’s hand
 

Cards can be multi-purpose. They can be the weapon or spell, which you play immediately. They can also be what we call Minions, or creatures (see rightmost image above). You can have up to 7 on the board for each player. Here you can see I have one VanCleef, and my opponent has 3 Minions which are on the top.

Alright, that’s basically what Hearthstone looks like when you play it. It’s obviously designed to play on a tablet. They simplified it a lot compared to Magic (Magic: The Gathering). So it’s very simple; they tried to make it very easy for people to get in and get their first free shot before buying cards, which is how they make money.

Hearthstone turn illustration video

Here is a quick video of me playing Hearthstone in normal game mode, just to show you the game (watch video above). You can see this is my turn, so I’m playing; my card does a special effect, brings a new card into the game. I draw a card which goes into my hand, I play another card, and I click End Turn. And then my opponent has a secret, which is a trigger effect. That’s basically what a turn looks like. On the video it looks awesome because the guy played really fast. Sometimes you have up to 90 seconds to play your turn, so you have to wait and drink your coffee. Some people try to actually play 2 games at a time.

Card attributes
Card attributes
What makes this game so interesting is the cards. Everything is a card, and you should look at how the game is structured. Actually, your hero is a card. And the card is the thing we are going to look at in depth in this first part. A card – a Minion here, the Yeti – has 4 attributes (see right-hand image). The first one is the ‘mana’, which is the cost you pay to play the card: as I said, from 0 to 10. And then you have the basic attribute in this card, which is ‘attack’, 4 here. And there’s ‘health’ – 5. So, that’s a card, and this is a very boring card, nothing special with this card.

Cards have special effects
Cards have special effects
What makes the game fun is that there are lots of cards which have a lot of interesting effects. These (see right-hand image) are 3 of my favorite cards: VanCleef, the Faerie Dragon and the Cabal Shadow Priest. And what’s very interesting is you can combine all those effects and do unintended or very special combo, and that’s basically how you win the game. So, finding good synergy between cards is one of the fun aspects of the game.

Key assumptions
Key assumptions
So, now that we know cards, the question is whether there are better cards that have a better bang for the buck than others. How do we find those? Well, I started with a theory. We are going to start this with a basic assumption of how we are going to model this. The first thing is: the mana cost is proportional to the card power, which means that if you play a 1-mana card it’s less powerful than a 2-mana card which is less powerful than a 3-mana card. If it weren’t like that, the game would be broken because everyone would play just the most powerful thing, which is 1, and overrun people.

Another assumption is: the power of the cards roughly increases linearly, which means a 2-mana card is roughly twice as powerful as a 1-mana card. If this hypothesis were not true, some players would have no chance of success. So, roughly, the linear is something you need to be able to have the game balance between short-term and long-term. It’s not quite linear, but it’s a reasonable assumption.

Finding good synergy between cards is one of the fun aspects of the game.

Card effects have a constant price. It means that the Divine Shield or any effect has the same price regardless of the card it’s in. So there is no discount for specific cards, there is no secret, which brings us to the fourth assumption: having a card has value. Remember, you only get one card each turn, so even holding a card has some value, which we call the intrinsic value of the card. And finally, we believe there is no secret. We believe Blizzard has no hidden component, hidden balancing factor, and what you have in the card is exactly what the value of the card should be. So if you sum all the attributes, you will get the value of the card. How you sum them is another question we are going to explore, but there is nothing secret about it, and we believe everything is in the game and you can actually look at it. That’s where we start.

Modeling a card
Modeling a card
So, how do we model a card? Well, as I said, the mana is the price, and the price is equal to the attack and the health and the intrinsic value of my Yeti (see right-hand image). There’s nothing else on the card, so we assume that the price of the mana is exactly the sum of those attributes. You put it into a linear function – don’t worry, it’s very simple – and you say, well, 4 mana is 4 attacks, where ‘a’ is basically what we call the ‘attack coefficient’, which is the base point, how much mana it costs to have 1 attack. Plus 5 health, where ‘h’ is the coefficient for health, which is the base point, how much 1 health point costs in mana. Plus ‘i’, which is the intrinsic value of the card, that is, how much mana it costs to even hold a card.

Comparing two cards
Comparing two cards
With that, without looking at the coefficients just yet, you can compare 2 cards (see right-hand image). Let’s take a very common card, the Boulderfist Ogre, which is heavily played in Arena. 6 mana is 6 attacks, plus 7 health, plus the intrinsic value. And we want to compare it to the Chillwind Yeti. It’s not really easy to compare, so we are going to be very-very hi-tech. We are going to go back to fourth grade and we are going to just divide by 6 – yes, that’s really hard, I know… And then it gives you 1 mana point. So, for 1 mana point you get 1 attack, 1.16 health, and the intrinsic value of the card. On the other hand, we are going to divide by 4. And by doing this, you just say, well, for 1 mana point what you get is 1 attack and 1.25 health, plus the intrinsic value. And in here you can immediately see that one of them gets a better bang for the buck. The Yeti gets more health points per mana point than the Boulderfist Ogre. That’s the kind of imbalance we are going to look for.

Another example
Another example
Let me give you a more interesting example (see right-hand image). Fireball, my most hated card – I hate to be killed by a fireball. That’s a very simple thing; for those who don’t know what a Fireball is, it’s a big ball of fire that people shoot at you, and you die. That’s what it is. Basically, this one is very simple to model. You pay 4 mana, and you get 6 damage. So, very simple to model: 1 mana is 1.5 damage.

A giant fireball is called the Pyroblast – bigger, stronger, meaner. The Pyroblast is 10 mana, and for 10 mana you get 10 damage. Well, okay, but then in that case it might be bigger but the value is not that great. You get 1 mana – 1 damage, and you can immediately see that’s not quite right. If you have 10 mana, you can play 2 Fireball and a half for the price of 1 Pyroblast. So the value of the Pyroblast is less interesting than that of the Fireball. What’s interesting is this is a new version of the Pyroblast. Earlier this year, we had a “pre nerf” version which used to cost 8 mana, where at that time the mana cost was even lower than the Fireball. Basically, before the adjusted value we get 1.25 damage for 1 mana. So, something is not right here, is it?

Fireball and Pyroblast compared
Fireball and Pyroblast compared
Let’s compare the two (see right-hand image). If you take 10 damage for Pyroblast, it implies that the Fireball should do 4 damage, not 6. And you’re like, well, okay, but the Fireball is properly priced, you should do 6 damage for 4. But in that case I want the Pyroblast to do 15 damage, not 10. Give me back my Pyroblast!

So there’s an imbalance. Even if you look at simple coefficients, you can see that there are some decisions which you can debate. Of course we don’t have all the data that Blizzard has, and it’s probably based on statistical analysis. But there’s some sort of mismatching complexity that is the basic, core idea of finding undervalued cards.

Card evaluation workflow
Card evaluation workflow
Okay, so how do we scale that to hundreds of cards? More precisely, we were able to do 130 cards for this research, because modeling each attribute is a little bit complicated. How do we do that? Well, we model the cards as we did before. Then we use those cards and we reverse the coefficients of each ‘a’, ‘h’ and so forth using linear algebra – don’t worry, it’s not as bad as it sounds. And then we use those reversed coefficients to compute what we call the ‘real value of the card’. And the last step is the easy step, where you say, well, here’s my real value, here’s my face value, you subtract one from the other, and if it’s a negative – well, you found an undervalued card. That’s as simple as this.

Cards to be modeled
Cards to be modeled
Let me show you on a very simple example – 5 cards – what it looks like (see right-hand image). And then, when we know that, I will show you the real reasons. So, let’s take 3 cards which have Charge. Charge is where, basically, you pull the card – and the card can attack as soon as it comes into play. So we have the Kor’kron, the Rocketeer and the Commander. And we are going to show 2 more cards which have Divine Shield. And the reason why the Argent Commander is in the middle is because it has both attributes. One of the ideas of having linear algebra is you can compute coefficients. As I said, this is the same price. So with 5 cards we can make it work, and that’s also why we can compare cards – because we have this very complicated interrelating.

Reversing attribute coefficients
Reversing attribute coefficients
The equations
The equations
So, we do as we did before: we put them into an equation (see leftmost image to the right). Our Kor’kron has 4 attacks, 3 health, plus ‘c’, which is a Charge coefficient, plus ‘i’, which is the intrinsic value of the card. And we do that for the 5 cards. Trust me, it’s correct. Then we are going to reverse the attribute coefficients. To do that, we put them into a matrix, which is, basically, a table (see rightmost image above). We put them like this: you say, well, for 4 mana you get 4 attacks, 3 health points, 1 Charge, 0 Divine Shields and 1 intrinsic value. Then I do the same thing for the Rocketeer; same thing for the Argent Squire, and here you can see it has Divine Shield and Charge, so you have 1 and 1. And then you add the rest of the 2 cards.

Then you apply 1 line of Python, which is the least square – and boom, you get the coefficients! The coefficients are: 1 for attack, -1 for health, 2 for Charge, 1 for Divine Shield and the intrinsic value. And I’m like: dude, that doesn’t make any sense; you can’t have a discount for health. It is because we only have 5 cards. 5 cards will not give you a good coefficient because there is too much instability. You need way more. But it’s just one of the examples.

Determining real price
Determining real price
Now, how do you find the real price? We are going to be back to kindergarten where you learned how to add stuff (see right-hand image). So you have 4a + 2h + c + d + i. Basically, what you do is you say, well, it’s 4*1 + 2*-1 + 2 + 1 + 1. And the price is 6. So the real value, according to our coefficient, is 6. Well, the card is fair; there is nothing different between the face value and the real value – so, not interesting.

Okay, let’s try again with one of the most undervalued cards in the game, the Argent Squire. A lot of people say it’s undervalued. Let’s try again. So it’s 1 attack, 1 health, plus Divine Shield, plus the value of the card. So it’s 1*1 + 1*-1 + 1 + 1. It’s 2. Wait, 2? No, the card is 1, right? And yeah, you’re right – that’s why it’s an undervalued card. It actually should cost twice as much. So even without the coefficients you can always see that this guy is clearly undervalued.

Undoubted dependency
Undoubted dependency
I posted that online, and we got a lot of good feedback about it. The most important one we got was you should take into account dependency (see right-hand image). And that’s actually true. One of the guys pointed out that Charge would be, basically, a factor of the attack. Same thing for Windfury – we model it by taking into account the attack of the card. The one where we have a lot of debate is the Divine Shield. We don’t know what Divine Shield should be. Is it just Divine Shield as a coefficient? Is it related to health, related to attack? It’s really difficult. If you have ideas, let me know.

We also got a comment that actually a card has a budget. And the budget is 2*mana + 1. I have no idea how they came up with this. It’s absolutely reasonable, and when you do it the coefficient looks way better, it is true. I just don’t know how this guy got it, but thanks!

The coefficients
The coefficients
So, after writing code and debugging, you have something, you run it – and voila! You get your coefficients. These are the coefficients we got for 132 cards (see right-hand image). We used the budget idea that has been proposed, 2 coefficient points is roughly 1 mana point. Basically, destroying a minion costs you 5 mana to add to a card. Board damage costs about 1.5 mana per point. Drawing a single card costs you roughly 1.5 mana. Divine Shield is pretty expensive, it costs you almost 1 mana point, and so forth.

We also have negative coefficients, which is basically the decrease of the card price. As you expect, having your opponent drawing a card is the highest one, followed by discarding cards. And then we have overload, which is a shaman mechanism where you pay another price. So, all of this seems perfectly fine, and we are really happy. And we have this guy – board damage. We did something we thought was really clever, which was like: let’s model a single target and multiple targets coefficient as different. That is a stupid idea. I’m going to show you why in a few slides, but keep in mind that we tried.

Visualization helps
Visualization helps
One way to visualize what it looks like is you can put it on a graph, where the X axis is how much Blizzard assigned to a card as the face value, and on the Y axis – how much the algorithm believes a card is worth (see right-hand image). On the left triangle it gives you the undervalued cards, they are in green; and the overpriced cards are on the bottom right triangle. Because we arbitrarily force cards to be easier undervalued or overvalued, you are not that interested in the ones which are at a fair price. When we model cards, the cards that have shared ability are lower value. It’s not because the algorithm favors one or the other, it’s just because when we did the modeling – and you can see it on the slide – most of them are on the left side of the graph. For higher cards they usually have special ability not captured by this model.

Results for undervalued cards
Results for undervalued cards
So, what is the result? One of the most undervalued cards is Soulfire. We do believe it actually should be at least 1 mana. Light’s Justice is also undervalued – there was actually a lot of discussion about this one, but then a lot of people pointed out that when you pick it into the Arena it gives you 4 attacks for 1 mana. Not surprisingly, Mortal Coil, Power Shield and Argent Squire also exploit this trap. Somehow it actually gives you something which seems reasonable to most people. The one which is a little bit bizarre is Sacrificial Pact; it’s probably a bug in the code.

And do notice we also have high value cards. The one which is probably the most powerful is the Fire Elemental, which the algorithm believes is probably 7 mana, not 6. That’s something which also has been mentioned before by people just by looking at the cards. So, somehow the algorithm gives you a reasonable result therefore we believe we are on the right track. If you want to look at the full details – all the coefficients, all the card ranks – they are on my website, you can just go there. We keep updating this. And if you have ideas on how to make it better, let us know.

So, how do you take it to the next level? Well, it’s really difficult, at least without extra data. And this extra data is how people play the game. Most of the cards depend on the state of the game, and we don’t have insight about it unless you have a lot and a lot of replays. Of course, Hearthstone being very new, we don’t have that many replays and it’s actually not a built-in feature.

No direct access to replays
No direct access to replays
Fortunately, we got our hands on 100,000 games which were played between May and June. We’d like to thank all of our anonymous friends for that – thank you very much! It’s not a long-term solution, and we really hope that Blizzard will give us the ability to see replays so that we can do our own analysis. I know it’s in the Terms of Service, and letting people look at things should be perfectly fine.

Evaluating Twilight Drake
Evaluating Twilight Drake
With this data you can do interesting stuff. The first one is you can actually price cards which have unique effects. Let’s start with a very simple example, the Twilight Drake (see right-hand image). The Twilight Drake is a card which is a 4-1 for 4 mana, and it has a special effect, where when it comes into play – it’s a Battlecry – it will give 1 additional health for each card you have in your hand. Obviously, the value of the card depends on how many cards you have when you play it. So we can build a model where you say: “Well, if I have one card in my hand, then it has 1 extra health and its real value is 1.3 mana. If I have two, it’s 1.9 mana and so forth – up to 9, where it’s 5.9.” So the question is, now that you have this table, you need to use replay data to be able to know how people play it. Are they mainly playing it with 8 cards in the hand, 4 cards, 2 cards..?

How fair is Twilight Drake's price
How fair is Twilight Drake’s price
So you do it, you draw this thing, and this thing is an exact graph (see right-hand image). What you see is the following: in the red on the left side, you see when people play it with fewer than 4 cards, and the card becomes undervalued. Basically, if you play with 4 cards you get 3 mana value of the card. If you play it with 5 cards in your hand, you get a 3.6 value. With 6, you have a 4.2 value. So this is a fair zone, basically. The price of the card is just right. If you play it with 7 or 8, then you get a lower value of your card. The average real value of the card is 3.7. Based on that, we assume that the Twilight Drake price is fair. It’s also interesting to show that looking at this we get the same conclusion as Blizzard, so we think we have something which is very similar to what they have, except they have way better data and better insight.

Edwin VanCleef card analyzed
Edwin VanCleef card analyzed
Let’s look at another card, which is one of my favorite cards – Edwin VanCleef (see right-hand image). What VanCleef does is it gains 2/2 for each card you play before it during a game. So all you have to do is, again, look at the replay and look at how many cards were played during the turn before it to get this value. And for a reasonable number of cards played before it – I know you can go way below that – if you have 2/2, it’s 1.09, roughly 1 mana. If you add a second card, you add 3 mana, which is roughly fair. 6/6 – you begin to get some value out of it. And when you play 5 or 6 cards, then the value of VanCleef is just outrageous.

We really hope that Blizzard will give us the ability to see replays so that we can do our own analysis.
VanCleef is undervalued
VanCleef is undervalued
As you can see (right-hand graph), VanCleef is most of the time undervalued. It is. Even if it’s a 6/6, which is 2 cards before it, then you already get, like, a 5 mana worth of your card. And for 4 cards before it, you get a 7 mana worth of your card, and so forth. So the average is 8.1. The average value of VanCleef today is 8.1 based on our data. So I do claim that VanCleef is undervalued, and I believe the right value is between 5 and 7 mana. I know it would make VanCleef harder to play, that’s why I say it should not be 8 or 9. But 3, in my opinion, is way too low; according to these data, it doesn’t make any sense. So, this is probably one of the cards which are the most undervalued.

Looking into Flamestrike
Looking into Flamestrike
Alright, the last one – Flamestrike (see right-hand image). We got a lot of questions about how you deal with AoE. Among cards which have Area of Effect, Flamestrike is one of the simplest ones. What it actually does is it does 4 damage for every minion of your opponent on the board. It’s easy, right? All you have to do is count how many minions you have on the board on your opponent’s side and then multiply that by the number of damage – and voila, right? So we just did that, and I look at these numbers and I’m like: oops, there’s something completely wrong here.

You notice if you have 2 minions, your card is already worth 13 points of mana – hmm, the modeling is going wrong somewhere. It doesn’t make any sense; you can have a 50 mana worth of card, so what’s wrong? It turned out it’s because we used board damage. Remember I told you we tried to be clever and separated single targets versus multiple targets? And this is what happens when you try to be too clever. Okay, let’s just burn it, let’s try again and go back to single target.

The number of minions matters
The number of minions matters
Now it makes more sense. That’s how we learned that we should not choose multi target versus single target – you should use only one coefficient, which is spell damage, not two. And it actually makes sense. The card becomes a good deal when you have 3 minions on the board, and not otherwise (see right-hand image).

The verdict on whether Flamestrike is fairly priced
The verdict on whether Flamestrike is fairly priced
The graph looks perfectly fair, you can see it visually (see left-hand image). Sometimes it’s undervalued, sometimes it’s overvalued, most of the time it’s between the two. So the card is perfectly balanced. And the lesson here is: don’t try to be too smart; do not split single and multiple targets. That’s one of the lessons learned. We also liked this idea, because actually looking at those cards helped us validate that what we do makes sense and everything is consistent, and when there is something strange we adjust the technique accordingly.

So, let’s switch gears a little bit. We are going to tell you about how you can predict your opponent’s deck, and I’m going to let Celine tell you a little bit about that.

Celine Bursztein: Hello everybody! My name is Celine, and I’m going to show you our in-game tool (see leftmost image below). The tool is a web application written in Python. It runs on a small web server called Flask. So you can display the web page, it just takes you to your game, as you can see on the left side of the screen. You can also use any devices with a web browser, such as a tablet. This tool implements all the algorithms described in this talk, so you can benefit from them easily.

Viewing game metrics
Viewing game metrics
Structure of the dashboard
Structure of the dashboard
The in-game tool
The in-game tool
 

The main screen in our tool is a real time dashboard (see middle image above). You can use it to track game metrics, played cards, and predict your opponent’s cards during a game. The first box on top displays the game metrics (see rightmost image above). There are 3 metrics. The Mana Advantage is the difference between how much mana you spent and how much mana your opponent spent by playing cards. The Draw Advantage is the difference between how many cards you drew and how many cards your opponent drew. And the Hand Advantage is the difference between how many cards you have in hand and how many cards your opponent has in hand. These metrics appear in green if you have the advantage and in orange if the advantage is for your opponent. In our study we found that these metrics are the most predictive of the game outcome. We won’t go into detail today due to lack of time, but we will do a blog post later about it.

In the below metrics, you can see your deck (see leftmost image below). So you can see how many cards of each type you have in your deck – in the T column, so T for Total. You can also see which of them are currently on the board in green in the P column, P for Played; and how many are dead in red in the D column, so D for Dead.

Predicted deck
Predicted deck
Opponent stats
Opponent stats
Player's deck
Player’s deck
 

Below your deck there is the third box that says what cards your opponent played (see middle image above). Every time your opponent plays a card, it will appear in this box, the total number of cards in the T column, the played cards in the P column, and the dead cards in the D column. And the last but not least, the final box will show you a prediction of which card your opponent is going to play based on the previous cards he played (see rightmost image above).

Possible sources of data
Possible sources of data
So, how did we manage to get the data? We could use packet sniffing to get the game data; it gives you the base data and it’s a violation of Blizzard’s Terms of Service, so we didn’t use it. We could use OCR, optical character recognition; it gives you good data but it requires a lot of CPU and it’s not very reliable. So we ended up using the debug log. You can start your game in debug mode. It’s a simple method to get data but it has some limitations, like there’s only game data and you cannot see your opponent name, player rankings, which card attacked which card, and there’s no info about decks. So it would have been great if Blizzard provided a log system as good as the one in the World of Warcraft.

Hearthstone in-game tool in action

Now you will see our tool in action in a short video (see above). The tool is on the left side of the game. I’m playing against Elie. I choose my card, and you will see that Elie starts with an extra card in his hand because he has the Draw Advantage and Hand Advantage. Now I’m picking up a card, so we have the same number of cards. As you can see in the dashboard, there is no more advantage for Elie. So now I’m playing Argent Squire, and you can see the card will appear in green in my deck and I spent 1 mana point, so the Mana Advantage is for me. As the game evolves, the dashboard reflects the changes, and as soon as Elie plays a card a prediction appears at the bottom. You can collapse your deck to get more room for your opponent info. Now I’m going to play Eviscerate. Let’s take a closer look, so Eviscerate was in the prediction. So, yeah, it’s working. At some point he tries to kill me with Leeroy, and good for me I have a Knife Juggler, it saved me. So, I hope you liked it.

The 'Turns' screen
The ‘Turns’ screen
In addition to the real time dashboard, you can also see the current game history in the Turns screen (see right-hand image). It allows you to see what happened turn by turn during the game and learn from your mistakes. Our tool will be available next week on GitHub. We currently need help with improving deck import using OCR, because we are currently using a text file. We also want to display the game history, do the macros, Windows packaging, and improve card modeling.

Before you get all your hopes too high and I tell you about how black magic is done, just a word of disclaimer. Because Naxxramas has just been released, the meta is quickly shifting, so the predictions are not accurate. As I said, we don’t have access to a lot of games, so expect the tool to not perform very well for the next few weeks. That being said, it will catch up as soon as the cards that have been released get stabilized.

Prediction workflow
Prediction workflow
So, how does it work? Well, it’s very simple (see right-hand image). We model card affinities, that is, which cards played with each other. Then we have an evaluation function which will return what are the most likely affinities to be played based on the previous cards played by the opponent. And then the tool goes through a bunch of replays and learns from those replays the card affinities and other metrics. And then, as Celine showed you, we have the in-game tool which, based on the algorithm, analyzes the cards that have been played and returns the most likely card to appear. That’s basically what it looks like. We are going to see each step at a time, and that will be the final part of the talk.

So, card affinities. As I said, Hearthstone has about 500 cards. So if you want to look at all the card combinations, it will be almost impossible. For 30 cards in the deck, we have close to an impossible number of combinations. So what we really do is we exploit the fact that in practice some cards really work well together, and some cards do not work well together. For example, if you play Druid you have a combo, and the combo is Savage Roar plus Force of Nature. So if you have one, it’s likely you have the other. On the other hand, you do not have Force of Nature and, I don’t know, a Murloc – doesn’t make any sense, this is not an affinity that you need to model.

Recording affinities using bigrams
Recording affinities using bigrams
We use the simplest thing you can think of, which is called n-gram. Here is one of the simplest versions called bigram (see right-hand image). What we look at is we look at the replay as the sequence of cards which have been played by the opponent. So we say, well, the Armorsmith has been played and then the guy played the Taskmaster. That’s one bigram, and we record the affinity between the two. We say, well, if you play the Armorsmith, then you are likely to play the Taskmaster. And then we look at the second part of the stream, which is the Taskmaster followed by the Acolyte of Pain, so we record that there’s also affinity between those two cards.

Un-ordered bigrams
Un-ordered bigrams
That being said, we know that cards are drawn at random. You get one card at each turn, so it’s at random. To model that and account for that, we use what we call un-ordered n-gram (see right-hand image). And here are un-ordered bigrams. If you play Armorsmith sometime during the game, then later on you are likely to play Taskmaster. If you play the Acolyte of Pain, then you are likely to also play Armorsmith. We also tried trigrams, which is three pairs of cards, and so forth. It turns out that the best model is the bigram.

Ranked predictions
Ranked predictions
So, how do we evaluate card affinities? Again, very simple (see right-hand image). The opponent played, let’s say, Deadly Poison and then also Shiv, which are two cards. Then, from there we say, well, what is the affinity of those cards? For Deadly Poison we know that Fan of Knives has been seen 500 times and Blade Flurry has been seen 350 times, so we look at those. And for Shiv we know that Blade Flurry has been seen 400 times, and when you played Shiv we also saw Amani Berserker 400 times. So we combine them and, again, very simple, the simplest strategy is to just get the sum. We say: “How many times did those cards appear?” And we get the ranking. And the ranking is: Blade Flurry is on top because it’s 350 plus 400. It’s followed by the Fan of Knives, it’s 500. And my third prediction didn’t make it, this guy is Amani. That’s how it works. It seems to be actually very simple. Turned out if you do something more complicated – it won’t work. We have tried more things, and this one actually works really well.

Training and testing
Training and testing
So, how did we do it? We took 50,000 replays and we did one model per class, because each class has some unique cards and we didn’t want to have cross stream. We ran the code and we learned the thing (see left-hand image).

Success rate
Success rate
Actually the algorithm ran the 50,000 replays in less than 3 minutes. It’s not that much work. And then – victory! I was actually shocked at how good the thing was. By turn 3, the highest prediction has a 97% chance to be played, which means that out of 100 games the first prediction that the algorithm returns at turn 3 will be played.

Average prediction curve
Average prediction curve
If you want to look a little bit deeper, this is a curve of the average prediction of the algorithm for 10 predictions turn by turn (see right-hand image). You can see it actually rise, because it gets more and more information as turns are played. The more cards your opponent plays the more we can look at affinities and the accuracy increases. And then it starts to decrease after turn 8 because there are fewer and fewer cards in the deck of the opponent, so you have less and less chance to be right. So the balance is somewhere between turn 4 and turn 8.

Ranking functions
Ranking functions
If you look at the ranking functions, the ranking functions do work (see right-hand image). Green represents our best prediction, the orange one represents our lowest prediction, which is at number 10. And you can clearly see that the best one is actually extremely good, up in the high 90s starting with turn 3, whereas the lowest one is barely above 20. And they are converging because the algorithm makes fewer and fewer mistakes but there is also less and less room for errors.

Plans for the near future
Plans for the near future
So, that wasn’t all we wanted to tell you guys. I wish we had more time to tell you more (see right-hand image). We wanted to tell you about predicting the game outcome – again, we’ll do a blog post. We also are looking into optimizing deck for mana-throughput because we know that mana advantage is the key factor to win. And also, by popular requests, we got requests for looking at hero power comparison and also comparing various types of decks. These are the things we are looking forward to do. If you have ideas about things you would like us to do, or if you have ideas about things we should do together, please tell us, we’d love to do it.

I would like to finish by saying thanks to the people who give us feedback – thanks a lot! We do read a lot of comments that people post to us. It’s really important because it helps us get better. In particular, we’d like to thank Neil and Zach who spent a lot of time helping us prepare for this talk and also giving us insightful feedback. And also thanks to our anonymous friend who gave us the replay data. Thanks a lot!

LEAVE A REPLY

Please enter your comment!
Please enter your name here