Suppose there exist n (maybe 100) baseball players, each of whom has his face on baseball cards. A fan can buy cards, one at a time. Each purchase has an equal chance of being any of the players (no matter how many cards are bought). How many cards must the fan buy in order to have a complete set with probability p (say, 0.99)? This is related to the well-known problem of determining the *expected* number of purchases necessary to acquire a complete set of cards, i.e., at least one of each of the n available types. The expected number of purchases is E(n) = n(1 + 1/2 + 1/3 + ... + 1/n) as discussed in the note Collecting k Complete Sets of Coupons. However, it's somewhat more difficult to determine the number of purchases necessary to give a certain probability of having a complete set. In a sense, this is a variation of the generalized "birthday problem", with 100 possible baseball cards instead of 365 possible birthdays. In general terms, if there are L distinct possible outcomes (each assumed to be equally probable), the problem is to determine the probability of a particular set of outcomes in N trials. If N is small, then the "brute force" method can be used to compute the exact answer. To take a simple example, suppose N=5. The seven possible partitions of 5 are 5 = [5] = [4+1] = [3+2] = [3+1+1] = [2+2+1] = [2+1+1+1] = [1+1+1+1+1] These partitions correspond to all possible outcomes. The first represents the cases when all 5 trials have the same outcome. The second represents the case when four of the five outcomes were the same and one was different. The third represents the case when three of the trials had one outcome and the other two trials had another outcome, and so on. Each of these seven outcomes has a definite probability. For example, the probability of outcome [5] is simply (1/L)^4, and the probability of outcome [1+1+1+1+1] is given by (L-1)!/((L-5!)*(L)^4). Let s(N,k) denote the number of components in the kth partition. The kth partition of N will consist of a certain number of 1s (i.e., unique outcomes), a certain number of 2s (i.e., "doubles"), a certain number of 3s (i.e., "triples"), and so on. Let a_t denote the number of t-tuples in the kth partition. Then define function N! M(N,k) = --------------------------- N PROD ((t!)^(a_t))*((a_t)!) t=1 This is called the M_3 multinomial coefficient in "The Handbook of Mathematical Functions" by Abramowitz and Stegun (chapter 24 on Combinatorial Analysis). Then I believe the exact probability of Q[N,k] (the kth partition of N) is given by M(N,k) L! Pr{Q[N,k]} = ----------------------- (L-s(N,k))! L^N Once we have computed the probability of each partition, we can then add up the probabilities for the partitions having any particular property of interest, such as those for which all possible outcomes occurred at least once during the N trials (e.g., you have at least one of the 100 available basaeball cards). This approach is easy if N is small. The difficulty is that the number of possible partitions of N (of L or fewer components) increases very rapidly as N gets larger. For example, p(180) = 684,957,390,936. Therefore, computing the EXACT rational answers for large values of N may be computationally intractable, although you could probably derive some good asymptotic formulas. Feller's "Introduction to Probability" discusses approximate solutions of this general problem. Notice that the solution of the "standard" birthday problem (to find the probability of NO duplicate results) is a special case of the above formula. In this case we want the probability of the single partition consisting of all 1s. Here we have M(N,k)=1 and s(N,k)=N, so the answer to the standard birthday problem reduces to the well-known formula Pr{no duplicates} = L!/[(L-N)!*L^N]. To answer the more specialized question of the probability of at least one of each possible outcome, a Markov model is convenient. The states can be distinguished by a single number, the number of outcomes that have occurred so far. The 100x100 transition matrix is 0.01 0 0 0 0 . . . 0 0 0 0.99 0.02 0 0 0 . . . 0 0 0 0 0.98 0.03 0 0 . . . 0 0 0 0 0 0.97 0.04 0 . . . 0 0 0 0 0 0 0.96 0.05 . . . 0 0 0 . . . . . . . . . . . . . . . . . . . . . . . . 0 0 0 0 0 0.98 0 0 0 0 0 0 0 0.02 0.99 0 0 0 0 0 0 0 0.01 1.00 The probability of having seen all possible outcomes after n trials is the [100,1] component of M^n. For n=2^k I get the following results n Pr 1 0.000000 2 0.000000 4 0.000000 8 0.000000 16 0.000000 32 0.000000 64 0.000000 128 0.000000 256 0.000137 512 0.556018 1024 0.996647 2048 0.999999 4096 1.000000 8192 1.000000 With this method you can determine the precise value of n for which the probability first exceeds any specific value.

Return to MathPages Main Menu