The health of Standard has been called into question recently. 3 weekends ago (with GPs in Santiago and Montreal), Marvel decks dominated. This dominance continued last weekend (with GPs in Manila, Amsterdam, and Omaha) even if the picture wasn’t as clear as before.

On the one hand, Marvel decks put a lot of players near the top of the standings, and it won GP Omaha in the hands of Brad Nelsen. On the other hand, the deck was played by a large part of the field so good results were to be expected, and it didn’t dominate the Top 8 in Manila or Amsterdam.

To get a better understanding of Standard, I simply turned to the data.

Methodology

As I was running the GP Amsterdam text coverage, I had access to the deck lists, so I recorded the deck choices of all competitors who started with records of 7-2 or better on Sunday. This information allowed me to transform a result like “Simon Nielsen defeats Oliver Tiu” into “Jeskai Vehicles defeats Temur Marvel.” Then, with some handy spreadsheets, I could categorize all results between players whose archetype I had recorded and eventually added everything up to find the matchup percentage between any two deck archetypes.

This gave me hundreds of useful matches from Grand Prix Amsterdam, and I combined the outcomes with a similar analysis that Walter Witt performed for all Day 2 decks at Grand Prix Santiago.

While doing so, one issue I ran into was the decision of what level of granularity I should use for deck archetypes. Take, for instance, Jeskai Vehicles and Mardu Vehicles. Should I lump them together or not? This is not trivial. These decks have a lot of cards in common, but changing Unlicensed Disintegration into Spell Queller will surely affect a number of matchups.

As always with modeling, the goal is to get a tractable representation of the real world by disregarding relatively unimportant details. To strike a balance, I eventually lumped together all decks with Toolcraft Exemplar and Heart of Kiran under a single “Vehicle decks” classification. I lose some detail, but more fine-grained archetypes would lead to an unmanageable, large matchup matrix and, due to the small sample size, insignificant matchup results.

I ended up with 6 broad Standard archetypes. For all matchups that had at least 30 games in my sample, I retained the empiric match win probability, rounded to the nearest integer percentage. For all other matchups, I took the expectation after a Bayesian updating procedure that started from a belief that a matchup is 20% with probability (w.p) 0.02, 30% w.p. 0.10, 40% w.p. 0.20, 50% w.p. 0.36, 60% w.p. 0.20, 70% w.p. 0.10, and 80% w.p. 0.02. I’ll go over this methodology in another article at some point, and the initial beliefs are fairly arbitrary, but the basic idea is to correct for small sample sizes: The matchup between energy aggro decks and blue-red control decks was 4-9 for instance, but my method gives 41% as the expected match win percentage rather than the empiric 31%.

The Results

Here is the breakdown of the overall (expected) match win percentages from these two recent Grand Prix tournaments. As an example of how to read this table: Black-Green won 54% of its matches against Marvel in the data set, indicating that Black-Green is slightly favored in this matchup.

  Marvel decks Vehicle decks Black-Green Energy Aggro U/R Control Zombies
Marvel decks 50% 49% 46% 56% 49% 61%
Vehicle decks 51% 50% 34% 46% 52% 61%
Black-Green 54% 66% 50% 55% 53% 49%
Energy Aggro 44% 54% 45% 50% 41% 62%
U/R Control 51% 48% 47% 59% 50% 63%
Zombies 39% 39% 51% 38% 37% 50%

To clarify archetype labels:

  • Marvel decks include Temur, Sultai, and 4-color variants.
  • Vehicle decks include all color combinations with Toolcraft Exemplar and Heart of Kiran.
  • Black-Green collects everything from Energy versions to Delirium versions.
  • Energy Aggro includes R/G Pummeler, Temur Energy, and other non-Marvel decks with Attune with Aether and Harnessed Lightning.
  • U/R Control is self-explanatory.
  • Zombies encompasses both mono-black and white-black lists.

When I look at a table like this, I’m always wondering what the equilibria might be. An equilibrium is a metagame (i.e., a distribution over the set of possible decks) where every deck from that metagame is 50-50 against that metagame as a whole and there is no way to improve. So every deck that is contained with positive probability in an equilibrium metagame has an expected win percentage of 50% against that metagame, and there is no deck that has an expected win percentage of more than 50% against this metagame.

If I suppose that these 6 archetypes represent all available choices in Standard and assume that the matchup numbers are adequate, then I can find one equilibrium: 13/17 Black-Green, 1/17 Blue-Red Control, 3/17 Zombies. You can verify that each of these 3 decks would have a 50% match win percentage in this equilibrium.

Marvel decks, however, would be 48.8% against this metagame. So once the metagame eventually settles there, you wouldn’t have a good reason to play Marvel in a competitive event! I was quite surprised to see this.

What’s more, it seems to be the only equilibrium. In general, there may be multiples, but the online tool I used (based on this algorithm) is supposed to find all equilibria and it only found one.

Some Caveats

These results were interesting, but you should take them with a grain of salt. There are some issues with my approach:

  • The outcome is highly dependent on the input numbers—change the matchup that Marvel decks have against Black-Green from 46% to 51%, and the equilibrium changes completely to 3/5th Marvel, 1/5th Black-Green, and 1/5th U/R Control.
  • The archetype definitions are simplified. A more accurate model might split up the Marvel decks into anti-Marvel Marvel, anti-Vehicle Marvel, anti-B/G Marvel, and so on. Such a model could capture the notion that if B/G decks would become 13/17 of the field, then surely everyone would hop over to anti-B/G Marvel to get a favorable matchup.
  • There are many other archetypes (Anointed Procession, New Perspectives, White-Blue Flash, and so on) that aren’t included in the analysis but that might be able to prey upon certain metagames.
  • The numbers are based on GP Santiago and GP Amsterdam, which were held two weeks apart, and decks underwent some changes in the meantime. For instance, Vehicle decks and U/R Control decks didn’t do particularly well against Marvel in Santiago, but they respectively went 27-16 and 9-5 against Marvel in Amsterdam, indicating that you can beat the deck with the right build and the right countermagic.
  • Few pros were running Zombies in these events, which may have weakened its performance compared to other decks.

I did the best I could with the data available, but I hope it’s clear that you shouldn’t take my conclusions too far. The sample size is simply too small, even after making the already questionable inclusion of GP Santiago. Think of it as a nice thought experiment more than anything else.

Is the Current Standard a Good Format?

Half a year ago, I wrote an article that formalized my notion of a good format. I’ll check the properties I introduced for the equilibrium I found.

My first criterion was diversity. The Standard format model I analyzed was not diverse, but not because of Marvel—it’s because Black-Green takes a huge (13/17) slice of the equilibrium metagame. But in this equilibrium metagame, a variety of macro-archetypes would be at least 48% to win a match in expectation: aggro in Zombies, midrange in Black-Green, control in Blue-Red, and combo in Marvel. So a variety of archetypes would be viable in a competitive setting, which is nice.

My second criterion was dynamism. The Standard format model I analyzed was not dynamic because there’s no way to get 60% or better against Marvel. The best you could get was 54% by playing Black-Green. When there is no outstanding way to prey upon the archetype, it is hard for the metagame to evolve over time. It is worth pointing out that better numbers were had by Vehicle decks and U/R Control in Amsterdam only, but counterplay to Marvel is still narrow and not overwhelmingly effective.

My final criterion was that gameplay was skill-intensive and fun. For the equilibrium metagame (B/G, U/R, and Zombies), the awesome GP Amsterdam finals between Lukas Blohon’s B/G deck and Benjamin Luft’s U/R deck indicates that gameplay would indeed be skill-intensive and fun. The matchup was filled with impactful decisions—you can watch it here:

But if Marvel dominated, then some of that desirable gameplay might disappear. Although I personally enjoy the tension and high variance corresponding to a Marvel spin when it occurs in moderation, it shouldn’t decide too many games. You also don’t need a lot of skill to win with a turn-4 Ulamog, but for what it’s worth, a turn-4 Ulamog only happens in approximately 9.4% of the games, as I showed in the Top Stories of Grand Prix Amsterdam.

Conclusion

So what can you conclude from this? Is Standard solved with Black-Green as the best deck now? Did Shaun McLaren truly break it? Should everyone ditch Marvel now? Of course not. The “caveats” section hopefully made that clear. But despite the small sample size of my analysis, there is one reasonable take-away: The data gave no reason to conclude that Marvel is the undisputed king of Standard or that it deserves a ban.

If Aetherworks Marvel was banned, the resulting 5-archetype format from my analysis would have the exact same equilibrium: 13/17 Black-Green, 1/17 Blue-Red Control, 2/17 Zombies! While that clearly doesn’t match reality at this point, there are real costs associated with a ban, and I am not convinced that they would be worth paying.