A while ago, we were discussing in our team forums how our preparation has related to our results. Someone pointed out that, at certain times, our preparation has been good (Nagoya, Barcelona) and our results haven’t, and in some other circumstances (namely Montreal) our preparation was not great but we were very successful.
For Montreal, some of the team members weren’t even there, others were too busy playing poker, others spent a long time eating, we ended up getting multiple rooms in a hotel and not having a good place to play—all goes what we imagine constitutes good testing. In the end we still put three people in the Top 8, and one in the Top 16.
That raised a question—are we, perhaps, placing too much value on things that are not as important? Should we change our definition of what “good testing” is, or were those tournaments just anomalies?
In Magic, it’s not uncommon to face this exact situation—one in which you think something is right, but evidence doesn’t seem to support it. Be that in making a play, choosing a deck, or finding a better way to playtest. When that is the case, what do you do? Do you follow what seems logically right to you, or do you follow empirical results? This is what today’s article is about.
In general, there are two good ways to analyze something: theoretically or empirically. Imagine we’re going to flip a coin, and we want to know what’s going to happen. Theoretically, we can see that there are two sides, and each has an equal chance of being the one that comes up, so we’re 50/50. We can also flip the coin ten million times and we’ll get something that is likely going to come up at roughly 50/50.
What we should not do, though, is flip the coin three times, see that it’s tails those three times, and then conclude the coin is probably going to come up tails the next time. That is being results-oriented. This is what you do when you justify a bad decision because “last game it worked”.
For instance: two of my friends are playing the Mono-Red mirror. Player 1 is at a low life total and has some creatures in play, all of which have two toughness. Player 2 is at 2 life, has an [card]Blinkmoth Nexus[/card] and a [card]Pyroclasm[/card] that he is going to cast this turn, and nothing else in play or in hand besides lands.
Player 2 animates [card]Blinkmoth Nexus[/card] to attack. Then player 2 realizes what he has done. Doing things in this order is just going to kill his [card]Blinkmoth Nexus[/card] for no reason, and it’s much better to [card]Pyroclasm[/card] first, and then animate it and attack.
Player 2 “Err, I didn’t mean to do that, can I take it back?”
Player 1 “No, sorry”.
Player 2 shrugs, attacks with [card]Blinkmoth Nexus[/card], and [card]Pyroclasm[/card]s it away.
Player 1 untaps and draws [card]Molten Rain[/card]. His opponent is at 2, but has only Mountains in play—the [card]Blinkmoth Nexus[/card] is gone. A couple turns later, player 2 draws some creatures and wins.
Now, what can we get from this story? Is it that you should always kill your own creatures with [card]Pyroclasm[/card] as much as you can? Is it that you should always let your opponent take back his plays if he asks nicely? No, that would be being results-oriented. The real moral here is that the play that wins is not necessarily correct. If this person arrives at the exact same situation in the future, should he [card]Pyroclasm[/card] his [card]Blinkmoth Nexus[/card] away again? No!
In sum: sometimes, you have two options: a play will give you 70% chance to win, the other 30% you lose. It does not matter how many times you make the 70% play and fail, you should still make the 70% play next time.
This seems rather obvious, but a lot of people don’t do it. You see this most with mulligans. Players will keep a good hand that has two lands and five spells and they won’t draw a third land ever, so next game they’ll be faced with the same hand and mulligan. Sometimes you will keep a hand that you know you shouldn’t, but it’ll work out—you’ll draw two lands in a row and win. Then, next time, you will keep the hand again because it worked out. Had it not worked out, you wouldn’t have kept. Whether it works or not should have no bearing on your decision to keep the hand in the future.
If it is logical that you should not be results-oriented in this situation, then why do people do it? Good question. It’s probably psychological—we remember more vividly remarkable events, so those situations are remarkable precisely because they are unusual. They’re the anomalies. It also gives us an easy way out—we don’t have to think, we’ll just do whatever worked last time.
At the same time, it’s silly to say “results don’t matter.” We’ve all watched Moneyball (we have, haven’t we?) and we know it’s the most important thing. We want to win, after all. It doesn’t matter if anyone thinks we shouldn’t be winning. If you win, it’s hard to convince yourself you’ve done something wrong. In fact, that you may have won because you did something wrong. How, then, do we deal with this?
The key to knowing when you should trust what seems right to you and when you should trust what is happening, should those differ, is based on the analysis of two factors: how strong your theory is, and how strong your results are. The stronger the theory, the stronger the results you need to go against it.
In general, the more sound theory is, the more empirical evidence you need before going against it. If your theory is very weak, or if you don’t have one, then being results-oriented starts to look more interesting, and you need less and less data to adopt that approach, because going for something that has a decent chance of being right is better than not having a clue what to do (In Magic, obviously; in many facets of life you would rather not make a decision if the information you have is not reliable, but you have to make a play and you have to pick a deck, so might as well let yourself be guided by whatever you can find). If your theory is very strong, then you need very strong empirical evidence to change your mind.
Strength of Theory
To know how strong your theory is, you have to see where it comes from. What is leading you to believe that you should [card]Lightning Bolt[/card] that [card]Wild Nacatl[/card]? Are you particularly experienced with the archetype? Are you a good player and you feel like Bolting it is right? Have you read about it from someone who is very good? Did you do the math and realize that if you don’t Bolt Nacatl now you’ll not be able to Bolt it for the next five turns either and will therefore take 15?
Often, our opinion is formed based on past experiences—it is based on results, but on way more results than we are getting now. When faced with the choice between what accumulated results from 5 years have told you (which is what has formed your opinion, even if you don’t know that this is why you have it) and what results from last week have told you, you should probably go with the biggest sample size.
Sometimes, you have no basis to even have an opinion. If I had asked you, “who is the best player in the world?” a couple of months ago, you’d probably have said, “Yuuya Watanabe.” If I ask you now, your answer is probably not going to be Yuuya Watanabe. Yuuya, and his main competitors, are probably as good now as they were three months ago, but you don’t actually know which player is more skilled. You have no idea which one of them plays better. Let’s be honest—if I have no idea who the best player is, it’s not very likely that you do.
Still, you have an answer! (And so do I, of course). The question was “who is better?” not, “who won the most?” but since you don’t have any information to answer that, you make the leap, “he won the most so is probably the best,” since “who won the most” is something that you can actually answer, much like “who is taller.” That is not a theory, it’s hardly an opinion. Your answer is going to be composed 100% on “facts,” even if those facts are not necessarily enough to answer your question. That is a valid way of thinking—it is not optimal, but it’s the best you’ve got.
Now, let’s say you’ve interacted with Yuuya and the other “contenders,” and you think Finkel is the better player, for good reasons. In this case, your theory is not very strong—you have an educated guess, but you are not sure. Yuuya had a better season than Finkel, but not by much. Your theory may be weak, but your empirical evidence is also weak. In this case, the difference in results should not be enough to sway you.
Now, imagine I think the best player in the world is my neighbor, Josh Smith, for good reasons. Josh has been to many tournaments and has never done well. Now, I have an opinion, and I have theory behind it, but evidence overwhelms my theory—it’s more likely that I am wrong in my initial theory than that a statistical anomaly has been happening for all those years.
In the [card]Blinkmoth Nexus[/card] example, there is no amount of times that this scenario happens that is going to make me change my mind—I know that the correct, probabilistic play is to not kill your own [card]Blinkmoth Nexus[/card] and I will stand by that no matter what else happens. I can’t even find a reasonable explanation for deviating, so I’m just never going to do it.
Strength of Empirical Evidence
Now, let’s go back to the coin flipping scenario. I’ll say that my theory that the chance of the coin coming up heads or tails is pretty strong, so I’m not likely to be swayed from that.
Imagine, however, that you flip a coin 1,000 times, and it comes up tails all 1,000 times. If I have to bet on what it’s going to come up next, I’d bet tails. I’d even bet Tails at 49/51 odds. Theoretically, I know it should be 50/50, but a thousand times all being tails would hint at something not being right. Maybe the coin is loaded? Maybe it has two tails instead of tails and heads? I don’t need to know why it is happening, I’m still going to go with what is happening, because evidence in this case is very strong and my theory, while also strong, is not infallible. I don’t know that the coin is an actual 50/50 coin.
Then why doesn’t this apply to Magic all the time? Because, to change theory, you need a lot more data than we usually have. In Magic, we usually flip three coins, not one-thousand, and three is not nearly enough of a deviation that you would consider something being wrong with the theory. As such, we’re much better served by understanding why things are happening than by knowing what is happening, because “what” is a lot more misleading when you have such a small sample size.
Sometimes, it’s tough to know when your empirical evidence is strong enough. One of the things to look for is a repeatable factor. Did the play “work out” because of something that happened once, or is that thing going to happen every time? If I keep a 0-lander and succeed, obviously I can’t expect to draw a lot of lands in a row every time—this is not a repeatable event and so shouldn’t affect my future experiences much. If I kept a slow hand and got overwhelmed by a normal draw, well, “having a normal draw” is a very repeatable experience, so that holds more weight. I can reliably expect it to happen in the future so it should affect my decision.
When we were playtesting UWR, for example, we would often get to the late game and lose anyway. Every time, I would think, “man, this was atypical, I should win the long game.” Except it kept happening, so maybe it wasn’t atypical after all. I decided to look for why that was happening—the main culprit was simply drawing a lot of air (lands, [card]Pillar of Flame[/card], etc) and no [card]Sphinx’s Revelation[/card]. Since the deck did have a lot of air and only 3 Revelations, that seemed to be a repeatable effect—it made sense that this would cause me to lose games, and we could not dismiss those results as an anomaly, so we had to investigate more.
It’s also important to know if you’re using the right examples for the right reasons, because sometimes it’s hard to understand what exactly is the correlation between two things. For example, imagine this scenario:
Hypothesis: Being able to [card]Mana Leak[/card] [card]Squadron Hawk[/card] is the most important thing in the matchup, so UB should be on the play.
Facts: Wafo-Tapa chooses to draw with UB versus UW, and Wafo-Tapa wins more with UB than anyone else, so UB should be on the draw.
The problem here is that correlation between the two things and your facts are not proven or even implied. There is no way of knowing that Wafo-Tapa is winning because he is choosing to draw. Perhaps Wafo-Tapa is just a better player than other UB pilots wins wins in spite of choosing to draw. It doesn’t actually matter how many matches Wafo-Tapa plays—here the sample size is not the issue, a correlation that might not exist is. In this scenario, you would need to analyze what might be causing those game wins. Assuming the build is the same and no one is cheating, they are:
1) The fact that Wafo-Tapa is on the draw.
2) The fact that Wafo-Tapa is better or that his particular opposition is worse.
3) The fact that Wafo-Tapa has been drawing better than the other people.
We can somewhat eliminate number 3 due to having a big sample size, but we still have both variables 1 and 2 and have no idea what is causing the effect. To do that, we need to isolate them separately. The correct way would be to have Wafo-Tapa play a lot of matches against the same opponents on the play and then on the draw, and to have other people play matches on the play and on the draw. If everyone wins more on the draw, then you can reasonably conclude that you would rather be on the draw. If only Wafo-Tapa wins more on the draw, then there is another factor causing the wins—you’d need to analyze further.
Note that, even in the scenario where everyone wins more on the draw, it doesn’t mean drawing is better. It’s possible that being on the draw leads to a certain playstyle—i.e. more defensively—that improves the matchup. People on the play try to be more aggressive, because they’re on the play, and that doesn’t work, so they win more on the draw. Perhaps they should just choose to play but still keep a defensive posture.
Of course, there is not enough time to test all that, you have to draw the line somewhere. Again, it’s going to depend on how much you trust the theory. If you have absolutely no idea if playing or drawing is better (which is not common, because playing should seem better to you in almost every situation, but imagine it doesn’t), then you can be results-oriented: “Wafo-Tapa wins more on the draw, so I’ll choose to draw.” If you have a reason to believe that is wrong, though (such as because you think being able to [card]Mana Leak[/card] a [card]Squadron Hawk[/card] is important), then Wafo-Tapa winning more is not enough evidence to contradict that and you would want to explore other options and analyze further before you draw a conclusion from that.
Note that I am not saying you have to do it. Only that, if you have a thought that makes sense and you still want to be results-oriented, you need moreresults than that. When I was faced with that exact situation, I did not choose to explore further. I thought my theory was solid enough that it would take too long to acquire evidence (i.e. results) that would convince me to throw it away, so I just went with the theory and chose to play.
Another similar example came from the semi-finals of PT Philadelphia. Sam Black was playing the Mono-Blue shoal deck (with [card]Inkmoth Nexus[/card]/[card]Blazing Shoal[/card]/[card]Dragonstorm[/card]) and Josh Utter-Leyton was playing our big Zoo deck. Sam had a couple friends playtest the matchup for him, and after playing they concluded that being on the draw was good for that matchup.
To them, that made sense. They concluded the matchup was about attrition and therefore the person on the draw would have an advantage. So, if empiricism corroborated their theory somewhat, what’s wrong with it? Well, the thing is, this was not their theory. It only became their theory to explain the phenomenon they experienced. They did what I did with the coin. “If it came up tails 1000 times, then maybe it’s loaded,” rather than, “the coin is loaded—see, tails 1000 times!” which would be a completely different scenario. Their initial hypothesis had to be that being on the play was better—historically that has been the case for pretty much every Zoo versus combo matchup (and in fact almost every matchup), so I’m sure they started with that—much like I started with a 50/50 coin because that is the norm.
Our theory was that being on the play was in fact very important, because, other than the normal reasons, it would let Josh develop his board without fear of dying. If Sam is on the play, he can lead with [card]Inkmoth Nexus[/card] and then Josh might just not be able to afford playing a guy on turn one, because if he does it and Sam has the combo then he just dies. If Sam is on the play, he can play a turn two [card]Blighted Agent[/card] and then Josh has to kill it, so he can’t apply much pressure. If Josh is on the play, though, then that turn spent killing the Agent is going to be followed up by an attack for five.
That is a very strong argument. To assume that we were wrong, I would need a lot of matches played. Sure, if we had played 100 matches and the person on the draw had won 80% of them, I would choose to draw, even if I didn’t exactly understand it. But if I’ve played 10 matches? No way. I don’t think Sam’s friends played 100 matches. I don’t think they had enough statistical significance to throw away something that makes sense based on the results of those few matches. They had a strong hypothesis, and weak empirical evidence against it, and they chose to go with the empirical evidence, and I think that was a mistake.
Judging Their Relative Strengths
“So, what if they played 25 games?” you might ask. Well, I don’t know. I am not a statistics or math student, I wouldn’t know how to calculate how many matches you need. It is mostly how you feel about things—if something makes sense to you, then don’t let a couple of games change your mind! Theory, understanding, that is worth a lot more than some results. If I played 25 games of Zoo versus poison and the person on the draw won every game, I would probably choose to draw. If I played 25 games and the person on the draw won 65% of them, I’d probably still choose to play. If I played 500 games and the person on the draw won 60% of them, I’d probably choose to draw. Again, there aren’t hard numbers for this—you need to see how strong you think your hypothesis and your empirical evidence are.
Take, for example, our Esper deck. We played Esper at the PT and, though we did all right, we didn’t do very well. I myself went 2-3 with it. Still, I thought the deck was good—the theory behind it still made sense to me, it should be good. So I played it again in Quebec.
I did OK in Quebec. I Top 64’d. I lost some matches that I really didn’t think I would have lost under normal conditions. I thought that maybe I should not play that deck again, but I couldn’t understand why I was losing. More importantly, I couldn’t understand why everyone else was losing with the deck—pretty much no one had done well with it.
I decided that the sample size was still too small, that my theory was still sound, and that the things that had led me to losing were not repeatable factors. Surely not being able to cast Verdict on turn four ever or drawing five lands in a row in the late game are not “normal” occurrences. More importantly, I didn’t want to be a results-oriented person. After all, one of the marks of a great player is that he is able to make the correct play 10 times, fail 10 times, and make the correct play again the 11th time. This applies to deck selection too, so I decided to stick to Esper.
I played Esper again in GP Rio. Again I did badly, and again no one else did well with it. At that point, I knew I was beaten—though I did not know precisely why everyone was losing with it, I knew I wouldn’t play the deck again unless I found a good explanation for why I was losing and how I was going to fix it.
At that point, I felt like results overwhelmed my theory. I had a mid-level theory, but strong results against it. Things that had happened once or twice before were happening way too often, to way too many people. Perhaps the deck did have a problem with flooding in the late game, perhaps it did have a problem with colors. So, even though I still think Esper seems good, I no longer think it is good, at least not in the version I played it.
So, to sum it up:
• Being results oriented is usually a bad thing. A theoretical grasp of what is happening is more important than previous results to determine what will happen in the future.
• If you don’t have a theory or if it is not very reliable, then you don’t need a lot of results to direct you. In this case, it’s OK to be results-oriented.
• If you have a theory but your results go against it, try to judge both. The stronger and better cemented your theory is, the more reliable data you need to go against it.
I hope you’ve enjoyed this,
See you next week!