A Barrel of Colors and Null Outcomes

 

Suppose we have a virtually infinite number of marbles of 9 different colors in a large barrel, and 30% of the marbles are white. The rest of the colors are evenly distributed amongst the remaining marbles. What is the expected number of random draws from the barrel in order to have a 95% chance of acquiring at least one marble of each of the 8 non-white colors?

 

This is a variation of the "Collector's Problem", which asks how many items (e.g., baseball cards) we would need to collect (randomly) before getting at least one of each category (e.g., player). The only difference is that we're asking for the expected number of draws to achieve 95% confidence (rather than 100%), and also 30% of the trials are "wasted" by the white marbles. Let's first neglect the white marbles (i.e., assume there are none), and deal with the basic problem, and then later discuss how to account for a non-zero percentage of white marbles.

 

Let dm,k denote the expected fraction of the N colors for which we have drawn exactly m marbles after k draws. Obviously after 0 draws we have exactly 0 marbles for 100% of the colors, so d0,0 = 1 and dm,0 = 0 for all m > 0. Thereafter the expected values of dm,k are determined by the recurrence

 

 

where d−1,k is understood to be 0 for all k. The values of dm,k given by this recurrence can be expressed in closed form as

 

 

Since d−1,k = 0 for all k, it follows that the expected fraction of colors that have not been drawn after k draws satisfies the simple recurrence

 

 

and so we have

 

To have 95% confidence of leaving none of the N colors undrawn, we need the expected value of d0,k to be just 5% of 1/N. Thus, with N = 8 colors, the required value of k is

 

 

Now, one approximate way of accounting for the fact that there are actually 30% white marbles in the barrel (so that each time we draw a white marble we have basically just wasted a draw), would be to simply multiply the above result by 10/7, which would give the overall result of 54.296 draws. This means that if we define an experiment as drawing about 55 marbles, and if we perform this experiment 100 times, we would only fail to hit every non-white color on less than 5 of the 100 experiments.

 

However, the above method of accounting for the white marbles is at best an approximation, and it becomes increasingly inaccurate as the proportion of white marbles increases. As stated previously, if there were no white marbles, the expected fraction of (non-white) colors undrawn after k draws would be [1 − (1/N)]k, where N is the total number of non-white colors. The above method of accounting for the 30% white marbles was essentially to say that this k is only 7/10 of the true value, because 30% of our draws are "wasted" on white marbles. So, letting Q denote the fraction of non-white marbles, we solved the equation

 

 

for the exponent kQ and then multiplied by 1/Q to give k = 54.29. The problem with this approach is that the Q factor should really be applied to the transition rate 1/N rather than to the exponent k. In other words, the expected fraction of colors undrawn after k draws is really given by

 

 

Now it's clear how (1) can serve as an approximation for (2), because the binomial expansions of these expressions are

 

 

respectively. These differ only in the second (and higher) order terms, which are negligible if (1/N) is small enough. Of course, the difference also vanishes as Q goes to 1 (no white marbles).

 

Another subtlety is translating the expected value into a probability. If E{k} is the expected fraction of the N colors that have no representatives after k marbles have been drawn, then the expected number of undrawn colors is (N)E{k}. According to a Poisson process, the probability of exactly j colors being undrawn after k marbles have been drawn is therefore

 

 

so the probability of 0 undrawn colors after k draws is simply

 

 

Setting this to 0.95 (to give 95% probability of no undrawn colors), we can solve for k to give

 

 

With N = 8 and Q = 7/10 this gives k = 55.146, which is slightly higher than our earlier approximation of 54.296.

 

We can also consider the probability of any specified partition of m marbles into n different (equally likely) colors. For example, suppose we blindly draw 55 marbles from a barrel that contains a virtually infinite number of marbles of 11 different colors. Of those 55 marbles, what is the probability that 10 are one color, 9 are another color, 8 are another, 7 are another, and so on down to 0? I believe the answer is

 

 

(We might also consider the partition [0,1,2,..,n-1] of n(n-1)/2 marbles into n colors.) In general, suppose there are N equally likely colors and we blindly draw M marbles. These M marbles will be partitioned into S differently colored subsets as follows

 

 

In other words, if we arrange the drawn marbles according to color, then c1 is the number of singles, c2 is the number of doubles, and so on. The probability of this particular partition (without regard to which color corresponds to each subset) is

 

 

 

 

Most of this expression is simply the multinomial coefficient A3 (called M3 in Abramowitz & Stegun but denoted here by A3 to avoid confusion with our parameter M), so we can write the probability as

 

 

The solution of the classic "duplicate birthdays" problem is a very simple special case of this formula. To find the probability of no duplicate birthdays among M < 365 randomly chosen people, we want the partition of M consisting of all 1s. Thus, c1 = M = S and A3 = 1, so the above formula reduces to (365!)/[((365−M)!)(365M)].

 

Return to MathPages Main Menu