Colored Balls in Urns

Given N urns and B balls of each of K colors, randomly distribute the KB balls in the N urns such that no two balls of identical color are placed in the same urn. What is the probability that at least one ball of each color will be in an urn by itself?

We know that the total number of distinct ways we can distribute the balls in the urns is C(N,B)^K where C(n,m) is the binomial coefficient n!/(m!(n–m)!). Each of these arrangements is equally likely, so we need only determine how many of these arrangements satisfy the special condition that at least one ball of each color is in an urn by itself. If we denote this number by Q(N,B,K) then we clearly have

for all N,B with N ≥ B. However, for more than two colors this becomes progressively more complicated. We might try to determine a generating function, but a more simple-minded method of determining the number of arrangements of K distinct sets of B balls into N urns such that each color is in at least one urn by itself is to enumerate the partitions of K(B–1) into N–K or fewer parts no greater than K, then append K 1's, and then evaluate the combinations of colors.

To illustrate, let's compute the probability for 9 urns with 4 balls in each of 3 colors. The partitions of 9 into 6 or fewer parts no greater than 3, augmented with three 1's, are shown below along with their respective enumerations. (To save space, let {mn} denote the binomial coefficient C(m,n) and let [mn\jk] = ((m!n!)/(j!k!)).)

Thus the probability of distributing 12 balls, 4 in each of 3 colors, into 9 urns such that each color is by itself in at least one urn is 1174320/2000376 = 2330/3969 = 0.587049635...

In principle this method can be used to compute any Q(N,B,K), but it becomes increasingly laborious as the values of N, B, and K increase. Is there a generating function for these numbers?

A related, but more tractable, problem is to compute the expected number of urns containing exactly 1 red ball, and exactly 1 green ball, etc. This can be computed fairly easily by a recursive method because at each point we only need to keep track of the number of empty urns, the number of urns with more than one ball, the number of urns with exactly 1 red, the number with exactly 1 green, and so on. The expected number of urns containing exactly one of a given color is the same for each color, and is equal to

Suppose we are interested in just approximate results for parameters in the range of N = 100 to 1000, K = 8 to 32, and we want to determine the optimum value of B. For parameters like these it's possible to come up with a reasonable estimate without too much trouble. Suppose N = 100, K = 8, and B = 10. Begin by distributing the 10 white balls. Now what is the probability that after distributing the other seven colors we will have "covered" every one of the 10 white balls? This depends on what percentage of the urns will get hit by one of the remaining seven colors.

We know that each color covers 10% of the urns (in this case), and that the coverages are mutually independent, so (in the limit for large N) we can just take the union, which is 1 – (1 – 0.1)⁷. This means that about 52 of the urns will be covered by these seven colors. (We don't cover 70 urns with these 70 balls because they overlap.)

The total number of arrangements of the original 10 balls and the 52 "covered" urns is C(100,10)C(100,52). On the other hand, the number of these arrangements for which all 10 white balls are in a covered urn is just C(52,10)C(100,52), so the probability that all 10 of the white balls are covered is just C(52,10)/C(100,10), which is about 0.000913.

Now, the situation is clearly symmetrical in the eight colors, so each one has probability of 0.000913 of being covered. Thus the probability that one or more of the colors gets completely covered is just the union of these events. Of course, they are not strictly independent, but for parameters in the range of interest they are essentially independent. Therefore the probability of failure (i.e., one or more colors completely covered) is 1 – (1 – 0.000913)⁸, which equals 0.00728.

Needless to say, there are several approximations buried in this procedure, and their validity would have to be evaluated for each set of parameters, but for the range of parameters mentioned above this approach gives fairly good results. To formulate it explicitly, note that the "52" above was really rounded off, and the fractional part was thrown away. This was done to give a nice integer argument for the binomial function C(m,n). However, to avoid that round off, let's use the gamma function Γ to give our factorials, so we can define g(x,y) = Γ(x+1)/(Γ(y+1)Γ(x-y+1)).

What we computed above was the probability of failure, whereas P(N,B,K) was defined as the probability of success, i.e., every color has at least one urn to itself. So the above result is P(100,10,8) = 0.99272. With this convention the general (approximate) formula for P(N,B,K) is

When K is very small (e.g., 2), the overall exponent "K" should really be K-1, because the degrees of freedom require the two colors to cover each other. However, for K greater than about 4 the independence is adequate to justify the full exponent K. This formula gives the following results for N = 100 and K = 8:

Notice that the above function is a very nice "bathtub curve".

Return to MathPages Main Menu