 # An Introduction to the Hypergeometric Distribution for Magic Players

How often will you draw a specific spell in your opening hand? How often will you draw multiples? How reliable are cards like Ancient Stirrings or Collected Company? All of these questions can be answered with the help of a hypergeometric distribution. By using a hypergeometric calculator, getting the percentages is easy, and in this article I’ll explain how.

# Using a Hypergeometric Calculator

The hypergeometric distribution can describe the likelihood of any number of successes when drawing from a deck of Magic cards. It takes into account the fact that each draw decreases the size of your library by one, and therefore the probability of success changes on each draw.

To be able to apply the hypergeometric distribution, it is essential that your deck can be classified into two mutually exclusive categories. A probability textbook might talk about success/failure, but in a Magic context it could be creature/non-creature for the purpose of Collected Company, or colorless/colored for the purpose Ancient Stirrings.

The easiest way to get some quick probabilities is by using an online hypergeometric calculator. I like the one from Stattrek.com, which looks like this. To use it, we need to plug four numbers into the calculator:

• Population size. In a Magic context, this is the number of cards in the deck when the card draw experiment starts. For example, if we have a 60-card Modern deck and we’re calculating the probability of drawing something in our opening hand, then this would be 60.
• Number of successes in population. For Magic, this is the number of functionally identical copies of cards in our deck that we want to draw. For example, if we’re calculating the probability of drawing one of our four Hardened Scales in our opening hand, then this would be 4.
• Sample size. The number of cards we’re drawing. For example, if we are interested in our opening hand, then this would be 7.
• Number of successes in sample. How many copies we want to draw. For example, if care about having one Hardened Scales in our opening hand, then this would be 1.

If we plug in the numbers, then the output is a set of probabilities. Note that to convert them into a percentage, you multiply by 100%. So a probability of 0.399 is the same as 39.9%. The way to interpret this output is as follows:

• Hypergeometric Probability: P(X = 1). The probability of drawing exactly 1 Hardened Scales. This is usually not what we’re interested in. Instead, the probability of drawing at least one Hardened Scales tends to be more relevant.
• Cumulative Probability: P(X < 1). The probability of drawing 0 copies. This would constitute the likelihood of a failure, i.e., a 7-card opening hand without Hardened Scales.
• Cumulative Probability: P(X ≤ 1). The probability of drawing 0 or 1 copies. This is generally not interesting to us.
• Cumulative Probability: P(X > 1). The probability of drawing 2 or more copies. This is generally not interesting to us.
• Cumulative Probability: P(X ≥ 1). The probability of drawing at least 1 copies. Usually, this is the number that we are interested in. So, the probability to draw at least one of your four Hardened Scales in your opening hand from a 60-card deck is 39.9%.

An alternative to this online hypergeometric calculator is the use of Excel or Google Sheets. As an example on how to use those tools, the “failure” probability of drawing exactly 0 Hardened Scales in your opening hand when you’re playing a 60-card deck with 4 copies is calculated as follows:

• In Excel: =HYPGEOM.DIST(0,7,4,60,FALSE)

In the remainder of this article, I will use the Google Sheets notation to represent hypergeometric probabilities.

# Useful Probability Rules

Sometimes, you want to compute the probability of some event from the known probabilities of other events. The three most important rules to know for Magic purposes are:

• Rule of complements: Since the sum of probabilities of all possible events equals 1, the probability of “hitting” is equal to 1 minus the complementary probability of “not hitting.” For example, the probability of drawing at least one copy of Hardened Scales is equal to 1 minus the complementary probability of drawing zero copies, as you can verify in the example above: 1 – 0.601 = 0.399.
• Rule of addition: In probabilities, “or” means addition. More specifically, if event A or B are mutually exclusive (which means that they cannot occur at the same time, such as when A represents drawing exactly zero Hardened Scales in your opening hand and B represents drawing exactly one Hardened Scales in your opening hand) then the probability of A or B is equal to the probability that event A occurs plus the probability that event B occurs. You can verify this in the example above: 0.336 + 0.601 = 0.937.
• Rule of multiplication: In probabilities, “and” means multiplication. More specifically, if event A and B are independent (which means they don’t affect each other’s occurrence, such as when A represents drawing zero Hardened Scales in game 1 and B represents the same for game 2) then the probability of A and B is equal to the probability that A occurs times the probability that event B occurs. So you’ll miss Hardened Scales in your opening hand two games in a row with probability 0.601 * 0.601 = 0.361.

# The Underlying Math

You don’t need to understand the underlying formula to apply hypergeometric probabilities, but some intuition will help, especially when I cover the multivariate hypergeometric distribution next week.

In a hypergeometric distribution with population size N, K successes in the population, and a sample size n, the probability to observe k successes in the sample is given by: $\frac{ \binom{K}{k} \binom{N-K}{n-k} }{ \binom{N}{n} }.$

One way to understand this formula, which uses the standard notation for the binomial coefficient, is that the numerator is the number of possible draws that we classify as successes, whereas the denominator is the count of all possible draws. Indeed, the denominator is simply the number of ways to choose n cards from a deck of size N.

To interpret the numerator, notice that it represents the number of ways to choose k elements from a set of K successes, which is exactly what we are looking for, multiplied by the number of ways to choose the n-k elements from a set of N-K failures, which is basically an automatic given.

# Examples for Deck Building

So all these probabilities are nice, but how do we use or apply them? Let’s go over a number of examples.

### Example 1: What is the probability of drawing between two and four lands in your opening hand with a 25-land Constructed deck?

Using the rule of addition, this probability is described by HYPGEOMDIST(2,7,25,60) + HYPGEOMDIST(3,7,25,60) + HYPGEOMDIST(4,7,25,60) = 77.84%. This means that the complementary probability of having an opening hand with 0, 1, 5, 6, or 7 lands is 22.16%.

That’s still relatively high when you consider that such land-light or land-heavy hands often result in a mulligan. This is exactly why experienced deck builders like card selection spells so much. And as an interesting fact: the probability of drawing between two and four lands in your opening hand is maximized with 25 lands. If you play fewer than 25 lands or more than 25 lands in your 60-card deck, the probability of having an opening hand with two, three, or four lands will be lower.

### Example 2: What is the probability of Goblin Charbelcher killing the opponent with a 50-card library containing five Mountains?

Assuming that the opponent is at 20 life and the library holds five Mountains and 45 spells, you need to hit zero Mountains in your top 10 cards. This probability is described by HYPGEOMDIST(0,10,5,50)=31.1%.

### Example 3: What is the probability of having at least two lands by turn 2 after keeping a one-land hand on the draw in Limited?

Assuming that you registered a 40-card deck with 17 lands, your library now contains 33 cards including 16 lands. Given that you have two draw steps, the probability of not finding the second land in time is described by HYPGEOMDIST(0,2,16,33)=25.8%. This means that the complementary probability of hitting your second land drop is 74.2%.

It’s risky, but if you have a 2-mana spell that produces additional mana or searches for additional lands, it may be a risk worth taking.

### Example 4: What is the probability of starting the game with a Leyline of the Void if you play four and are always willing to mulligan to six?

To determine this, we need to combine various probabilities. First of all, the probability of drawing at least one Leyline of the Void in your 7-card opening hand when you run four copies in a 60-card deck is equal to a number that we’ve already seen in a Hardened Scales context: 39.9%. Given this, the probability that you would mulligan down to six is 60.1%.

Once you mulligan to six, the probability of seeing at least one Leyline is 1-HYPGEOMDIST(0,6,4,60)=35.1%. So if you are always willing to mulligan down to six in search of a Leyline but keep every six-card hand, then the probability of starting with a Leyline is 39.9% + 60.1% * 35.1% = 61%. Here, I used both the rule of addition and the rule of multiplication.

### Example 5: What is the probability of hitting two creatures with Collected Company in Bant Spirits?

Let’s suppose that we have registered a deck with 30 creatures. To figure out the probability of hitting at least two creatures with Collected Company, we need to know the library size and the number of creatures remaining when we cast the spell.

If we consider a specific situation, say when we cast Collected Company when our library contains 50 cards, 26 of which are creatures, then the probability of hitting two or more creatures in the top 6 is given by 1 – [ HYPGEOMDIST(0,6,26,50) + HYPGEOMDIST(1,6,26,50) ] = 92.2%.

But there is no guarantee we would end up in this exact scenario where we drew four creatures in our top 10. In a real game, it might have been 3. Or 5. Or some other number. Or maybe we would cast Collected Company with a 45-card or 40-card deck. Basically, all kinds of situations could occur. But for deck building purposes, we usually just want one number: the expected hit probability when casting Collected Company in a game.

There are various ways to define and model this notion of “casting Collected Company in a game.” We could factor in mulligans, condition on drawing certain combinations of cards, assume a certain turn of the game, and so on. The approach that I find most appealing (because it’s simple, relevant for a deck with any number of Collected Companies, and applicable throughout the entire game) is to simply remove one Collected Company from the deck, blindly draw any number of cards (representing my opening hand and any number of draw steps), and finally put its ability on the stack.

Since blindly exiling cards from the top of our library is the same as blindly exiling them from the bottom, which clearly wouldn’t influence anything as long as there are still six left for Collected Company, the probability that we’re interested in is the same as when we put Collected Company on the stack for our 59-card library. I don’t condition on drawing at least four lands because the probability of having enough mana to cast Collected Company is very high (and thus its influence on our library is negligible) and increasing every turn, which would defeat the purpose of having a simple, relevant number that stays applicable throughout the entire game.

So by removing one Collected Company from the deck and putting it on the stack with the remaining 59-card deck containing 30 creatures, the probability of hitting two or more creatures is given by 1 -HYPGEOMDIST(0,6,30,59) – HYPGEOMDIST(1,6,30,59) = 91.0%.

### Example 6: What is the probability of drawing two or more burn spells with Risk Factor in Burn?

Consider a typical Modern 60-card Burn deck with 28 burn spells. Using the same logic as I explained for Collected Company, I will remove one Risk Factor from the deck and put it on the stack with the remaining 59-card deck. The probability of hitting two or more burn spells in the top 3 is then given by HYPGEOMDIST(2,3,28,59) + HYPGEOMDIST(3,3,28,59) = 46.1%.

### Example 7: What is the probability of hitting a land with Ancient Stirrings in Tron?

Consider a typical Modern 60-card Tron deck with 19 lands. Using the same logic as I explained for Collected Company, I will remove one Ancient Stirrings from the deck and put it on the stack with the remaining 59-card deck. The probability of hitting a land in the top 5 is given by 1 -HYPGEOMDIST(0,5,19,59) = 86.9%.

### Example 8: Assuming no mulligans, how many copies of a certain type of card do you need to be 80% sure you’ve drawn at least one by turn 6 on the play?

If you try out some numbers in a hypergeometric calculator, or just check out my article on the subject, you’ll discover that the answer is 7. Indeed, given that you’ve seen 12 cards by turn 6 on the play, we have 1 – HYPGEOMDIST(0,7,12,60) = 80.9%, and with only 6 copies you’d be below 80%.

# What is an Acceptable Percentage?

When building decks, you can use hypergeometric probabilities to get an idea of success and failure rates, but what is an acceptable percentage, and how should that translate to how many copies of a certain type of card to play? The answer depends on a lot of factors, most importantly the impact of things going right or wrong. But I can give some general guidelines, based on experience and successful tournament decks:

• 90% is an acceptable success rate when the failure case is a disaster. For example, not having the right colored source of mana to play your spells, not hitting a land with Satyr Wayfinder, or having to pay 3 mana for Daring Buccaneer. After all, being color screwed, playing a vanilla 1/1 for 2 mana, or casting a 2/2 for 3 mana is atrocious, and you really don’t want these things to happen more than once every ten games.
• 80% is an acceptable success rate when the failure case is not as bad. For example, not hitting any card with Glint-Nest Crane or Militia Bugler, or having to pay 3 mana for Wizard’s Lightning. After all, when you look at a 1/3 flyer for 2 mana, a 2/3 vigilance for 3 mana, or 3 damage for 3 mana, you’re slightly overpaying, but it’s not the end of the world—you’re still affecting the board in a relevant way. I might even accept 70% consistency, depending on the card availability in the format. In the end, it’s all a trade-off.

# Conclusion

Using the hypergeometric distribution to analyze the consistency of your deck or to figure out how many copies of a certain card to play is actually pretty easy, and I hope that this article was helpful in explaining how.

Next week, I’ll be back with an explanation of the multivariate hypergeometric distribution, which you need when your deck consists of more than two types of cards. For instance, if you want to know the probability of getting an opening hand with two Soul Spike and four Chancellor of the Dross, you need to divide your deck in three categories: Soul Spikes, Chancellors, and other cards. In that case, the hypergeometric distribution doesn’t apply anymore, and you can’t use the rule of multiplication because the event of drawing two Soul Spikes and the event of drawing four Chancellor of the Dross are not independent. Next week, I’ll show how you can calculate these probabilities.

Scroll to Top