You can read Part 1 here.

Drafting like a computer is a steep aspirational goal for both humans and the MTG Arena bots. For humans, if we can rely more on rigorous card evaluations and consistent strategies, and less on bad habits–for example, pigeon-holing ourselves into one preferred archetype–then we may expand our ability to draft different decks competently and improve our win rate in the long run. In part 1 of this series I outlined an example formula which illustrates how drafters could theoretically improve the quality of our selections. Coincidentally that is also how I might program the MTG Arena draft bots to function if I were responsible for encoding their behavior. To summarize that first article, the strategy involved establishing a Day 0 pick order based on static card ratings to evaluate cards and then adapting that rating system throughout the draft in an attempt to put together a cohesive deck that prioritizes card quality and deck cohesion. One flaw in that strategy that I outlined was that Limited formats are known to quickly evolve and our Day 0 pick order based on set reviews may not withstand the test of time. Additionally, any static card ranking can eventually be learned and then exploited by the humans drafting with the bots. As I alluded in the last article (and several commenters on that article guessed), establishing artificial intelligence strategies such as machine learning or neural networks into our bots are theoretical candidates to address this problem and seems feasible within the MTG Arena environment. Today we’ll explore those and other options for improving our bot card evaluations throughout the lifetime of the limited set. I do not work for Wizards and have no role in bringing this idea to reality, so just consider this another thought experiment on the possibilities that could someday exist for our drafting and for bots.

Can We Use Machine Learning to Improve MTG Arena Draft Bots?

For our purposes, we are going to define machine learning as creating a draft bot that learns from its past drafting experience through positive and negative feedback. This would allow our bots to augment their behavior based on trial and error of what works and does not work and, as you would expect with human drafters, allows them to improve their drafting throughout the life of a limited format.

The two key parts we need in order to establish a machine learning model are: 1) the ability to mine data and 2) criteria to evaluate results as positive or negative. The data collection aspect seems like it should be trivial in MTG Arena, and the software could (and should) be capable of recording the draft picks that the draft bots make and in what order they make them. The bots evaluating their draft decks, however, is the complex part. The most straightforward feedback would be based on number of wins in the draft, but the bots do not actually play any games with their decks – so that is not obtainable data. If we try to implement other possible measures of a “good” draft deck (also known as a fitness function) such as low converted mana cost, a good curve, minimizing the number of colors, etc. then we can easily imagine bad draft decks that fit the criteria for good fitness and good draft decks that do not fit the criteria for good fitness. This overfitting to our tested outcomes could theoretically end up with decks completely missing the synergy aspect of drafting and deck building and may even end up with bots indistinguishable from the status quo on Arena now. Because of this lack of an objective measure of feedback, I believe that machine learning alone may not be feasible for our goals. I would be interested in your solutions to this problem in the comments, but it seems to me that we will need to identify another solution outside of a genetic programming type of machine learning alone that has been used to teach computers to play other games.

Mining Human Draft Data

While we would not have the necessary feedback data from bots, we should have plenty of historic data available for real drafts done by human drafters. We can harvest the draft order data of these real human drafters to help our bots generate a living pick order that evolves as long as we continue to harvest new data, potentially even evolving throughout the entire format! As the human players adapt to new strategies and synergies, the bots will simply observe and mimic this behavior. In the end, our goal is not to design a perfect draft bot but instead to create an experience that is as close to indistinguishable from an all-human draft experience as possible. It is probably a good assumption that selections during draft made by Mythic-ranked players on average will be better than selections made by Bronze-ranked players and we can take advantage of that discrepancy to even establish a draft bot difficulty setting, for example.

In part 1, I described the Day 0 Draft ratings that are established at the beginning of the format and are based on pre-set evaluations of each card. In this article I will define Day 0’ (as in Day zero <href=”#Use_in_mathematics,_statistics,_and_science”>prime) as the current rating based on historic human draft data up to the present moment which the bots will use to make their future card selections. We can establish this Day 0’ evaluation using data from actual human draft results instead of the prescribed card evaluation.

Establishing Day 0’

In a draft, each pack in front of you presents a choice of 1 card from all the available cards. If we simplify our entire card set to only the 15 cards available in your pack, then we could imagine that it is simple to establish a pick order because there are some cards that are clearly better than the other options available. If we presented the same exact pack to 100 different players, we would expect that most would have similar ratings and we would be able to determine a combined ranking of these cards based on the selections that the players made the most. This would be Day 0’ for those 15 cards.

If we expand this example to now use all the cards in a set, then enough random players evaluating enough random packs will eventually reveal a pick order for the entire set based on historic draft selections of actual players. This data set contains novices and experts and everyone in between. In any large sample size there are bound to be inexperienced drafters, so even cards like Ugin, the Ineffable which are basically consensus first picks among Pro Players would only likely max out at 99.99% of the time being the first pick since it “wins” most of its head-to-head matchups against other cards in the set when presented as 15-card samples in individual packs. We would also expect a different card like Jaya’s Greeting to win many head-to-head matchups because it is a very strong card, but still lose when it is presented in a pack against other cards like Ugin or Ral. For example purposes, let’s just say that Jaya’s Greeting wins 70% of its head-to-heads as determined by actual human drafters. Eventually enough random packs will be opened where the overall win percent of each card in head-to-head matchups can be used to generate a Day 0’ ranking order for the entire set.

The major benefit of this technique of training our bots is that in addition to power level evaluation alone, that 70% for Jaya’s Greeting may also reflect a summation of color preference among players, experience level with the set, or other “soft” drafting factors outside of raw power level of the card in a vacuum. In this case we do not even need to establish the fitness function for our bot like we would with machine learning because our bot does not need to understand what is a “good” deck, it is just observing and mimicking the historic behavior of players.

These new base rankings not only generate Day 0’ rankings that our bots could theoretically interpret, but they also create an opportunity to provide another source of variance so that our bot draft experience is not entirely formulaic and exploitable. For example, if we assume that 99.99% of the time our bots will pick Ugin, as they should, but 0.01% of the time they will pick a different card (mimicking a novice drafter). If our bot happens to fall into the 0.01% of the time that it did not take Ugin then it will then move on to the second most highest rated card in the pack and will then select Jaya’s Greeting 70% of the time (0.007% overall when both of these cards are presented in the same pack). If our bot falls in the 0.01% of the time that it did not take Ugin and the 30% of the time that it did not select Jaya’s Greeting (0.003% overall when both cards are presented in the same pack) then our bot is likely imitating a very inexperienced drafter, which is acceptable for our purposes because these players do exist. After our bots have established their full Day 0’ ranking, we can then use the data mined from the human player draft results to further change the draft experience. You could imagine that if the choices are between two similarly rated cards such as Ugin’s Conjurant and Jaya’s Greeting then the probability of the bot’s behavior forking between the two options is increased. This is a natural defense against exploitation by players since the behavior of the bots not only reflects historic player evaluation but also contains some degree of chance.

From the Day 0’ selection, we can program our bots to follow the same formula I outlined in the last article for picks after pack 1 pick 1:

Rating = rating base prime x (1.03 ^number of cards rated 3.5 or higher drafted in same color + 1.01 ^(14-cards left in pack) x (pack number))

Where the 1.03 and 1.01 example values can have some <href=”#Random_jitter”>random jitter to make each bot’s behavior unique from the other bots at the table and from the past bot behavior.

Can We Make Arena Bots Even More Complex?

Science alone does not often answer if we should do something (that’s ethics), but rather focuses on if we could do it. So far, everything that I have outlined in both this article and the last should be implemented in MTG Arena for a basic, functioning bot draft experience. More complex augmentations are certainly available, and I think they could be positive additions depending on expertise and resources available within the Arena team for development. Two of these supplemental complexities involve 1) an increased selectivity of which historic player data is used as input and 2) implementing neural networks.

Bot Skill Variation

When we harvest player draft data, we use our immense amount of historic player data to vary our bots’ draft behavior based on player skill level. We would expect that there are more inexperienced drafters at the Bronze Rank on Arena and that the player skill increases as we increase towards Mythic. We could program our bots to reflect this change in player skill and mimic that variation themselves. We could program our bots so that bots in Bronze draft pods only harvest data from Bronze-level players and bots in Mythic-level draft pods only harvest data from Mythic players. This would result in divergent behavior from the two bots which could be similar to an “easy mode” and “hard mode” difficulty, and every other rank acting as a different intermediate difficulty. Theoretically you would still expect the bot Day 0’ rankings to be similar at the extremes; Ugin is likely still the top-rated card in both groups. But cards more towards the middle of the rankings are more controversial might vary greatly between the two extreme skill variations. For example, Jaya’s Greeting may be at 65% in the Bronze data but 80% among the Mythic tier while a card like Navigator’s Compass might be 35% in the Bronze level and 5% in the Mythic level. There could be a Day 0’ Bronze, Day 0’ Gold, Day 0’ Mythic ratings that each provide a different player experience. If this variation were implemented, it would provide a different draft experience for novice players than for experienced players, and allows the players to “level up” with those players around them. This approach still allows Bronze players to have similar level decks to other Bronze players in the played matches that follow the draft and Mythic players to other Mythic players, but again makes in more difficult to exploit the various ratings because it is slightly different at each level. A strategy or an entire archetype might be differentially drafted at Mythic compared to Bronze. You could even randomize the skill level at the draft pod so that some of the bots are Bronze, some Gold, some Mythic, etc. If Wizards of the Coast begins to collect this data, I imagine that it would have other uses as well such as it may be informative (or at least cool) for development to see which cards act as new player “traps” that experienced players are able to evade. If Wizards ever decides to end their archaic data embargoes, then players could analyze this data at set rotation and glean player patterns or preferences that differ between rankings.

Neural Networks

If we want to get very complex, MTG Arena may generate enough data for us to repeat a new base rating process for every single pick throughout the entire draft (Day 0’ Bronze P1P1 vs Day 0’ Bronze P1P2 vs Day 0’ Bronze P2P2, etc.). If we look at only the packs where players selected a Jaya’s Greeting as P1P1, then we can then rank all of the cards in the set for P1P2. This process is similar to the drafting process (as I understand it) in Eternal where when I select a pack that pack goes into a pool that some other player at some future player receives. I then receive a different pack from a different pool from a different player to make my next pick. MTG Arena can use the immense data available to segment off the data where Jaya’s Greeting was P1P1 and use historic drafting data to automate the Eternal drafting process. I would expect to find that many of the card rankings for Day 0’ P1P2 are similar to the Day 0’ P1P1 rankings, but perhaps cards like Casualties of War has now dropped from 65% to 55% because of the perceived increase in difficulty in casting that card by players who received that exact sequence of picks. So the Jaya’s Greeting P1P1 acts as our first node in our neural network and our P1P2 will be our second node, and an entire draft will have 45 nodes. As our collection of historic draft data grows with enough iterations, the pathways between two nodes will become strengthened and others will not, and we can train our bot to follow along the path that is most traveled among those available. In many cases there will be a historic drafter who had followed that path of picks and can help our bots make each decision along the way. As we travel deeper in the draft the odds of an exact same pathway decrease, but if there is no historic drafter that our bot can follow then it can try to match up with historic drafters who have made the most similar choices for subsequent picks.

This option requires acquiring and storing a lot of data and knowledge of implementing neural networks, which may be outside the resources and expertise available for the Arena team. It’s undeniable, though, that actual historic player picks for the entire draft is the most precise way for a bot to mimic player behavior. Neuronal networking would have the added benefit that our bots would “know” when to prioritize mana fixing and to draft cards that combo together, and the increased realism would be very useful for players preparing for a Mythic Championship and for players new to the game. I could even imagine this technology used in coverage where a featured player’s draft pack is shown on screen and the different cards in the packs are shown with historic “stats” so we can see if the pro is following the expected path. This can generate excitement when/if the featured drafter deviates from the predicted picks based on the historic drafts from the player hive mind, similar to the pocket camera in poker events allowing coverage to show odds of winning the hand.

Conclusions

Maybe my improvements to MTG Arena are not feasible or maybe not even possible, but this futuristic utopia could preserve the benefits of bot drafts while adding the benefits and realism of live drafting. It should be obvious that I do not share Wizards’ trepidation about making data available to the players. In fact, I would like to see Wizards embrace these types of advanced technology and would analyze this data myself if it were ever made available, but maybe this type of openness is too much to ask for the foreseeable future.

Is this neural network possible without requiring a super-computer? Do you think bot difficulty should vary between Bronze and Mythic rankings? Would you like mined draft data to be available for players to analyze? Let me know in the comments!