A Barrel of Colors and Null Outcomes 

Suppose we have a virtually infinite number of marbles of 9 different colors in a large barrel, and 30% of the marbles are white. The rest of the colors are evenly distributed amongst the remaining marbles. What is the expected number of random draws from the barrel in order to have a 95% chance of acquiring at least one marble of each of the 8 nonwhite colors? 

This is a variation of the "Collector's Problem", which asks how many items (e.g., baseball cards) we would need to collect (randomly) before getting at least one of each category (e.g., player). The only difference is that we're asking for the expected number of draws to achieve 95% confidence (rather than 100%), and also 30% of the trials are "wasted" by the white marbles. Let's first neglect the white marbles (i.e., assume there are none), and deal with the basic problem, and then later discuss how to account for a nonzero percentage of white marbles. 

Let d_{m,k} denote the expected fraction of the N colors for which we have drawn exactly m marbles after k draws. Obviously after 0 draws we have exactly 0 marbles for 100% of the colors, so d_{0,0} = 1 and d_{m,0} = 0 for all m > 0. Thereafter the expected values of d_{m,k} are determined by the recurrence 



where d_{−1,k} is understood to be 0 for all k. The values of d_{m,k} given by this recurrence can be expressed in closed form as 



Since d_{−1,k} = 0 for all k, it follows that the expected fraction of colors that have not been drawn after k draws satisfies the simple recurrence 



and so we have 


To have 95% confidence of leaving none of the N colors undrawn, we need the expected value of d_{0,k} to be just 5% of 1/N. Thus, with N = 8 colors, the required value of k is 



Now, one approximate way of accounting for the fact that there are actually 30% white marbles in the barrel (so that each time we draw a white marble we have basically just wasted a draw), would be to simply multiply the above result by 10/7, which would give the overall result of 54.296 draws. This means that if we define an experiment as drawing about 55 marbles, and if we perform this experiment 100 times, we would only fail to hit every nonwhite color on less than 5 of the 100 experiments. 

However, the above method of accounting for the white marbles is at best an approximation, and it becomes increasingly inaccurate as the proportion of white marbles increases. As stated previously, if there were no white marbles, the expected fraction of (nonwhite) colors undrawn after k draws would be [1 − (1/N)]^{k}, where N is the total number of nonwhite colors. The above method of accounting for the 30% white marbles was essentially to say that this k is only 7/10 of the true value, because 30% of our draws are "wasted" on white marbles. So, letting Q denote the fraction of nonwhite marbles, we solved the equation 



for the exponent kQ and then multiplied by 1/Q to give k = 54.29. The problem with this approach is that the Q factor should really be applied to the transition rate 1/N rather than to the exponent k. In other words, the expected fraction of colors undrawn after k draws is really given by 



Now it's clear how (1) can serve as an approximation for (2), because the binomial expansions of these expressions are 



respectively. These differ only in the second (and higher) order terms, which are negligible if (1/N) is small enough. Of course, the difference also vanishes as Q goes to 1 (no white marbles). 

Another subtlety is translating the expected value into a probability. If E{k} is the expected fraction of the N colors that have no representatives after k marbles have been drawn, then the expected number of undrawn colors is (N)E{k}. According to a Poisson process, the probability of exactly j colors being undrawn after k marbles have been drawn is therefore 



so the probability of 0 undrawn colors after k draws is simply 



Setting this to 0.95 (to give 95% probability of no undrawn colors), we can solve for k to give 



With N = 8 and Q = 7/10 this gives k = 55.146, which is slightly higher than our earlier approximation of 54.296. 

We can also consider the probability of any specified partition of m marbles into n different (equally likely) colors. For example, suppose we blindly draw 55 marbles from a barrel that contains a virtually infinite number of marbles of 11 different colors. Of those 55 marbles, what is the probability that 10 are one color, 9 are another color, 8 are another, 7 are another, and so on down to 0? I believe the answer is 



(We might also consider the partition [0,1,2,..,n1] of n(n1)/2 marbles into n colors.) In general, suppose there are N equally likely colors and we blindly draw M marbles. These M marbles will be partitioned into S differently colored subsets as follows 



In other words, if we arrange the drawn marbles according to color, then c_{1} is the number of singles, c_{2} is the number of doubles, and so on. The probability of this particular partition (without regard to which color corresponds to each subset) is 





Most of this expression is simply the multinomial coefficient A_{3} (called M_{3} in Abramowitz & Stegun but denoted here by A_{3} to avoid confusion with our parameter M), so we can write the probability as 



The solution of the classic "duplicate birthdays" problem is a very simple special case of this formula. To find the probability of no duplicate birthdays among M < 365 randomly chosen people, we want the partition of M consisting of all 1s. Thus, c_{1} = M = S and A_{3} = 1, so the above formula reduces to (365!)/[((365−M)!)(365^{M})]. 
