An Introduction to the Multivariate Hypergeometric Distribution for Magic Players

Last week, I explained the hypergeometric distribution and how to use online hypergeometric calculators for Magic purposes. With these tools, it’s easy to analyze the consistency of your deck, how many copies of a certain card you should play, or how often your Collected Company will hit.

But to apply the hypergeometric distribution, it is essential that your deck can be classified into two mutually exclusive categories. In a Magic context, it could be creature/noncreature for the purpose of Collected Company, or land/nonland for the purpose of analyzing mana bases.

But if you are interested in more complicated questions that require you to model your deck as a set of three or more types of cards, then the “basic” hypergeometric distribution does not apply anymore. For instance, if you want to know the probability of getting an opening hand with two Soul Spike and four Chancellor of the Dross, you need to divide your deck in three categories: Soul Spikes, Chancellors, and other cards. For those cases, you need the multivariate hypergeometric distribution.

Using an Online Multivariate Hypergeometric Calculator

I wasn’t even aware that an online tool existed until two readers pointed it out to me last week. Thanks to you both! I had always used spreadsheets or custom pieces of code to run the multivariate calculations myself, but it’s much easier to just plug a few numbers into an online tool, and I’m happy to share the discovery with you.

The one that both readers linked, made by Michael B. Moore, can be found here. It’s intuitive to use, and it worked perfectly for me. Here is how you can calculate the percentage probability that a certain combination of cards will occur in the top of your deck.

You have to click the “Add a card type” button twice for this specific example. It’s important to fill the final row with 48 other cards as well so that the calculator knows the total number of cards in the deck. I filled in the card names for clarity, but you don’t actually have to spell any card names to get the numbers.

What the above calculation reveals is that if you play a 60-card Eldrazi Tron deck in Modern without any card selection, then you would draw at least one copy of each Urza land in your top 10 cards in 12.6% of the games. So barring mulligans, you’ll have access to 7 mana on turn 3 on the draw approximately once per 8 games on average.

The Underlying Math

Without this online tool, the formula to get this percentage probability for Eldrazi Tron would be

\sum_{M=1}^4 \sum_{P=1}^4 \sum_{T=1}^{ \min(4,10-M-P) } \frac{ \binom{4}{M} \binom{4}{P} \binom{4}{T} \binom{48}{10-M-P-T} }{ \binom{60}{10} } = 12.6\%.

Running these numbers or even understanding this formula requires some proficiency with spreadsheets, a programming language, and/or mathematical notation. If you don’t care about any of these aspects, then feel free to skip right ahead to the “Examples for deck building” section below. But if you do want to learn more about the underlying calculations, or how to run some numbers that the Deck-u-lator tool is not immediately capable of, then let me explain.

Let’s start with a fundamental building block: the binomial coefficient. Using standard factorial notation, the number of ways to choose an (unordered) subset of n cards from a given library comprised of N cards is given by:

\binom{N}{n} = \frac{N!}{n!(N-n)!}.

In both Excel and Google sheets, you would use =COMBIN(N,n), replacing N and n by their respective numbers or cells. For example, for N=4 and n=2, the resulting binomial coefficient is 6. Indeed, there are 6 ways to choose 2 cards from a library consisting of 4 different cards labeled A, B, C, and D: you have {A, B}, {A, C}, {A, D}, {B, C}, {B, D}, and {C, D}.

To intuitively understand the general formula for the binomial coefficient, let’s stick with this 4-card library example and draw two cards. For the first card, there are 4 different options. Then, given any of those 4 options, there are 3 possibilities for the second draw. That gives 4 * 3 possible sequences in total, which is exactly what’s captured in the N! / (N-n)! part of the formula. But under this way of counting, drawing A first and B second is a distinct possibility from drawing B first and A second. Since we don’t actually care about the order in which we select cards (as is clear when you think of drawing your opening hand), we have to remove the duplicates. Since a set like {A, B} contains two elements, there are 2! ways to order it. This is true for all of those sets, which is precisely why we divide by n! in the formula. And there you have it—the binomial coefficient. Hurrah for combinatorics.

Now let’s use the binomial coefficients to determine some probabilities. Suppose we draft a deck with 16 lands (eight Forest and eight Plains) and 24 spells cards (18 creatures and 6 sorceries). Then the probability of drawing 3 lands and 4 spells in our opening hand is given by the hypergeometric probability:

\frac{ \binom{16}{3} \binom{24}{4} }{ \binom{40}{7} } = 31.9\%.

The denominator is the count of all possible draws, i.e., the number of ways to choose an (unordered) subset of 7 cards from a 40-card library. The numerator is then the count of all possible draws that we classify as a success, which we determine by multiplying the number of ways to choose an (unordered) subset of 3 lands from a fixed set of 16 by the number of ways to choose an (unordered) subset of 4 spells from a fixed set of 24.

The logic underlying this calculation can be extended. Suppose we tighten our classification of success by requiring specifically two Forest, one Plains, and four spells. This moves us into multivariate hypergeometric territory, but the probability of this event can be determined by simply splitting the first binomial coefficient in two:

\frac{ \binom{8}{2} \binom{8}{1} \binom{24}{4} }{ \binom{40}{7} } = 12.8\%.

Continuing along the same lines with our eight Forest, eight Plains, 18 creature, 6 sorcery deck, the probability of drawing two Forest, one Plains, three creatures, and one sorcery is given by:

\frac{ \binom{8}{2} \binom{8}{1} \binom{18}{3} \binom{6}{1} }{ \binom{40}{7} } = 5.9\%.

To define the multivariate hypergeometric distribution in general, suppose you have a deck of size N containing c different types of cards. Specifically, there are K_1 cards of type 1, K_2 cards of type 2, and so on, up to K_c cards of type c.  (The hypergeometric distribution is simply a special case with c=2 types of cards.) Given this deck, if we draw n cards without replacement, then the probability to draw exactly k_1 cards of type 1, k_2 cards of type 2, and so on, up to k_c cards of type c is given by:

\frac{ \binom{K_1}{k_1} \binom{K_2}{k_2} \ldots \binom{K_c}{k_c} }{ \binom{N}{n} }.

This formula represents the probability of drawing a certain, exact combination of cards. Yet often, we are happy as long as we draw at least a certain combination of cards, without caring about the contents of the other draws. For example, we might be interested in the probability of drawing at least two Forest, least one Plains, and at least three creatures in our 7-card opening hand, with no restrictions on the seventh card. To determine the probability of drawing at least this combination, we would have to sum all possibilities for the seventh card:

\frac{ \binom{8}{3} \binom{8}{1} \binom{18}{3} \binom{6}{0} }{ \binom{40}{7} } + \frac{ \binom{8}{2} \binom{8}{2} \binom{18}{3} \binom{6}{0} }{ \binom{40}{7} } + \frac{ \binom{8}{2} \binom{8}{1} \binom{18}{4} \binom{6}{0} }{ \binom{40}{7} } + \frac{ \binom{8}{2} \binom{8}{1} \binom{18}{3} \binom{6}{1} }{ \binom{40}{7} }= 15.0\%.

This is also essentially what’s going on with the Urza land formula that started this section, albeit with a more complicated, symbolic summation:

\sum_{M=1}^4 \sum_{P=1}^4 \sum_{T=1}^{ \min(4,10-M-P)} \frac{ \binom{4}{M} \binom{4}{P} \binom{4}{T} \binom{48}{10-M-P-T} }{ \binom{60}{10} } = 12.6\%.

For interpretation, the number of Urza’s Mines, M, ranges from 1 to 4, and the number of Urza’s Power Plants, P, ranges from 1 to 4 as well. The number of Urza’s Towers, T, also ranges from 1 to 4 if possible, but we have to keep in mind that we cannot draw, say, four Mines, four Power Plants, and four Towers in 10 cards. Given M and P, the maximum number of Urza’s Towers we can draw is 10-MP, which is reflected in T’s upper bound of summation. For any (M, P, T)-combination in this range, the number of other cards drawn in the top 10 is given by 10-MPT. We determine the corresponding multivariate hypergeometric probability for each—there are 60 such (M, P, T)-combinations in this range—and add them all together.

Fortunately, the online Deck-u-lator tool does all of this automatically for us. In other words, its use of “number of cards you need” is interpreted as “how many cards you need at least”. By the way, this also means that Deck-u-lator can be used as a hypergeometric calculator as long as you’re only interested in the probability of drawing at least a certain number of cards. This is usually the case in Magic.

Examples for Deck Building

Now let’s put it all into practice. For each example, I will show the formula for the underlying calculation as well as how to input it into the Deck-u-lator. Naturally, both yield the same result.

Example 1: What is the probability of drawing four Chancellor of the Dross and two Soul Spike in your opening hand?

Assuming a 60-card deck that plays four copies of each key card, we need to sum the probability of drawing four Chancellor of the Dross and three Soul Spike, and the probability of drawing four Chancellor of the Dross, two Soul Spike, and one other card:

\frac{ \binom{4}{4} \binom{4}{2} \binom{52}{1} }{ \binom{60}{7} } + \frac{ \binom{4}{4} \binom{4}{3} \binom{52}{0} }{ \binom{60}{7} } = 0.000082\%.

So you’ll draw the lucky 20 damage opening hand once per 1.2 million games on average. I wouldn’t bet on it.

Example 2: How many functionally identical combo pieces do you need to hit a 2-card combo with at least 60% consistency by turn 4 on the play?

With 7 copies of each, you are only 53.7% to draw at least one copy of each combo piece in your top 10 cards, so that’s not enough. But with 8 copies of each we have:

\sum_{A=1}^8 \sum_{B=1}^{ min(8,10-A) } \frac{ \binom{8}{A} \binom{8}{B} \binom{44}{10-A-B} }{ \binom{60}{10} } = 61.3\%.

This underlines the value of having redundancy. To be fair, decks that can easily win without the combo (like Splinter Twin back when it was still legal) don’t necessarily need the highest levels of combo consistency, so they can easily shave a few combo pieces. For decks with no Plan B (such as Ad Nauseam), drawing the combo consistently is more essential.

In any case, most 2 card combo decks run several tutors or card selection spells to add consistency. Unfortunately, there is no easy way to incorporate, say, four Serum Visions into a multivariate calculation. You could get some good insight by supposing that you’d see two extra cards each game, so that you’d run the numbers with 12 cards drawn rather than 10, but that is only an approximation. If you want more precision, you could set up a detailed simulation, but that generally takes a lot of work.

Example 3: For Modern Bogles, what is the probability of drawing a hexproof creature, a land, and an Aura in your opening hand?

Assuming a 60-card deck with eight 1 mana hexproof creatures, 18 lands to cast them, and 19 Auras (not counting Daybreak Coronet, since that requires another Aura to be castable), the probability of drawing at least one of each type of card is given by:

\sum_{B=1}^5 \sum_{L=1}^{6-B} \sum_{A=1}^{7-B-L} \frac{ \binom{8}{B} \binom{18}{L} \binom{19}{A} \binom{15}{7-B-L-A} }{ \binom{60}{7} } = 55.1\%.

Since many hands without at least one copy of each of the three pieces necessitate a mulligan—there are some exceptions involving Kor Spiritdancer or Dryad Arbor, but they’re still not what you’re hoping for against removal-heavy decks—the deck has some consistency issues. Fortunately, a mulligan is a superb card selection tool, and it’s a Bogle player’s best friend.

Example 4: For Modern Dredge, when you dredge Stinkweed Imp, then what is the probability of milling at least one Prized Amalgam along with at least one creature to trigger it?

Let’s use the same approach as I described and defended for the hypergeometric distribution last week. Set aside one Stinkweed Imp from the deck, blindly exile any number of cards from the top of your deck (representing your opening hand and any number of draw steps), then dredge 5. The probability of interest is the same as when we dredge 5 with a 59-card deck containing four Prized Amalgams, eight creatures that can return it, and 47 other cards:

\sum_{P=1}^4 \sum_{C=1}^{5-P} \frac{ \binom{4}{P} \binom{8}{C} \binom{47}{5-P-C} }{ \binom{59}{5} } = 14.2\%.

So the dream will happen only about once per 7 times on average. But you have to keep the faith in your heart. As a wise philosopher once said, “if you let your dreams come true, then happiness will follow you.”

Example 5: For Modern Storm, what is the probability of drawing a mana reduction Wizard, a Gifts Ungiven, and at least three lands in your top 12 cards?

Assuming a 60-card deck with six Wizards, four Gifts Ungiven, 17 lands, and 33 other cards, the probability of drawing at least three lands, at least one Wizard, and at least one Gifts is given by:

\sum_{L=3}^{10} \sum_{W=1}^{ min(6, 11-L) } \sum_{G=1}^{ min(4, 12-L-W) } \frac{ \binom{6}{P} \binom{4}{G} \binom{17}{L} \binom{33}{12-W-G-L} }{ \binom{60}{12} } = 29.4\%.

While this is only a rough approximation of consistency, you can reasonably expect to see as many as 12 cards by turn 3 on the play in a deck filled with Serum Visions, Opt, and Manamorphose, so this calculation at least comes close to estimating the probability of, barring mulligans, going turn 2 Wizard, turn 3 Gifts.

Admittedly, it’s an overestimation since you won’t have the card selection spell every game and because Serum Visions is not actually Ancestral Recall. Another issue is that if you draw your first Wizard as your very last card, then you can’t deploy it in time to ramp into a turn 3 Gifts. Given the difficulty of enumerating the enormous set of possibilities, incorporating both aspects analytically would be close to intractable. But under the realization that 29.4% is an overestimation, saying that you can go turn 2 Wizard, turn 3 Gifts Ungiven about once per 4 games on average would be a realistic ballpark estimate.

Example 6: What is the probability of having both of my colors in my opening hand in a 9-8 mana base in Limited?

Barring mulligans, the probability is given by:

\sum_{P=1}^{6} \sum_{M=1}^{7-P} \frac{ \binom{9}{P} \binom{8}{M} \binom{23}{7-P-M} }{ \binom{40}{7} } = 69.2\%.

Although the probability of having both of your colors increases to 87.3% by turn 4 on the play, the probability of seeing both colors in your opening hand is not as consistent: You’re missing at least one color 30.8% of the time. Especially since a large amount of these hands would result in a mulligan, that’s an issue. This is why I like to run Boros Guildgate even in 2-color decks.

It’s worth noting that if your opening hand contains only 0 or 1 lands in total, then you obviously don’t have both colors. In such cases, which are encompassed in the 30.8% figure, the problem is not the construction of the colored mana base—it’s just bad luck. In my article on how many colored mana sources you need to consistently cast your spells, I do take this into account by instead presenting the probability of hitting your color conditional on drawing at least a certain number of lands.

Example 7: For Standard Golgari Midrange, barring mulligans, what is the probability of drawing 1GG mana from your lands on turn 3 on the play?

This problem is more difficult than the preceding ones because there are multiple overlapping ways to generate at least 1GG, and when you have multiple alternatives that might occur together, probabilities get tricky. When events A and B are not independent, then the probability that A or B occurs is equal to the probability that A occurs plus the probability that B occurs minus the probability that both occur.

Let’s apply this to the question at hand. A typical Golgari Midrange deck might contain 16 green-producing lands and eight other lands. Let’s model this as 16 Forest and 8 Swamp. Now, to have the right mana for casting Jadelight Ranger on-curve, one possibility is to draw at least three Forest. Another is to draw at least two Forest and at least one Swamp. Yet, there is overlap between these sets.

Using the rule I described above, you can use Deck-u-lator by determining the probabilities of the alternatives separately and then subtracting the probability that they happen together. So that’s 55.0% (the chance to get two Forest and one Swamp in the first nine cards) plus 44.9% (the chance to get three Forest in the first nine cards) minus 30.7% (the chance to get three Forest and one Swamp in the first nine cards). The final answer is 69.2%.

Alternatively, you can express it as:

\sum_{F=2}^{9} \sum_{ S=max(0,3-F) }^{9-F} \frac{ \binom{16}{F} \binom{8}{S} \binom{36}{9-F-S} }{ \binom{60}{9} } = 69.2\%.

Example 8: What if we also have four Llanowar Elves that are good on turn 1 or 2 but don’t contribute towards producing 1GG when we draw one on turn 3?

What is this, an exam on probability theory? You’re just trying to see an equation that explodes the page, right?

Well, using the same Golgari Midrange deck as before with four Llanowar Elves tossed in, we could already have 1GG between lands and Elves in your first eight cards, or we could have only GG in our first eight cards and then draw any land, or we could have only GB in our first eight cards and then draw a Forest. Using a spreadsheet or program, we have to distinguish between all these possibilities and add up their respective probabilities:

\sum_{F=1}^{8} \sum_{ L=max(0,2-F) }^{ min(4,8-F) } \sum_{ S=max(0,3-F-L) }^{8-F-L} \frac{ \binom{16}{F} \binom{8}{S} \binom{4}{L} \binom{32}{8-F-S-L} }{ \binom{60}{8} } + \sum_{F=1}^{2} \frac{ \binom{16}{F} \binom{8}{0} \binom{4}{2-F} \binom{32}{6} }{ \binom{60}{8} } \frac{24}{52} + \frac{ \binom{16}{1} \binom{8}{1} \binom{4}{0} \binom{32}{6 } }{ \binom{60}{8} } \frac{16}{52} = 78.5\%.

As expected, Llanowar Elves increase our mana consistency.

Example 9: What is the probability of drawing at least one Madcap Experiment, at least four lands, and no more than one of your two Platinum Emperion in your top 11 cards?

This is another one of those calculations that is not easy to do via the Deck-u-lator since you need to not draw a certain set of cards. But assuming a 60-card deck with 4 Madcap Experiment, 25 lands, 2 Platinum Emperion, and 29 other cards, the probability is given by:

\sum_{M=1}^4 \sum_{P=0}^1 \sum_{L=4}^{11-M-P} \frac{ \binom{4}{M} \binom{2}{P} \binom{25}{L} \binom{29}{11-M-P-L} }{ \binom{60}{11} } = 39.7\%.

If you had been playing three Platinum Emperion, then the probability of drawing no more than two while drawing Madcap Experiment and four lands in the top 11 would be 40.4%, which is only a small increase.

Yet the second copy is worth it: If you had been playing only one Platinum Emperion, then the probability of drawing none while drawing Madcap Experiment and four lands in the top 11 would be only 34.4%.

Example 10: For Modern Burn, what is the probability of drawing no more than two creatures, exactly two or three lands, and at least five burn spells in your top 10 cards?

This is another one of those calculations that is not easy to do via the Deck-u-lator since you need to not draw a certain set of cards while drawing a specific range of copies for another type of card. These differ from the “I need at least this combination” calculation that the tool is set up to do. But a spreadsheet or program will give the answer fairly easily.

Assuming a 60-card deck with 12 creatures, 20 lands, and 28 burn spells, the probability of drawing no more than two creatures, exactly two or three lands, and at least five burn spells in your top 10 cards is given by:

\sum_{C=0}^{2} \sum_{L=2}^3 \sum_{B=5}^{10-C-L} \frac{ \binom{12}{C} \binom{20}{L} \binom{28}{B} }{ \binom{60}{10} } = 17.3\%.

Note that if we had cut a land for another creature, then this probability would sink to 16.0%. So does that mean that we should be running 20 rather than 19 lands in Burn? Well, it may be a nudge in that direction, but deck building is far more complex than just maximizing the probability of the dream draw, and there are many ways to define the dream draw for Burn.

But in any case, with the multivariate hypergeometric distribution at your fingertips, you can now analyze the consistency of your deck and build an optimal version according to your own criteria. Thanks for reading, and see you again for more math-related fun in 2019!

Share this

Discussion

Scroll to Top