And Then There Were None

Suppose we have n objects, each with a 1/m chance of disappearing in 
any given second. What is the mean of the number of seconds until
all of them are gone?

This may seem like a well-posed question, but it's actually ambiguous,
because it doesn't clearly specify whether "1/m per second" is to be 
applied continuously as a constant rate, discontinuously on a sequence 
of discrete 1-second intervals, or as any of an infinite number of 
other distributions yielding a 1/m chance of disappearing in any given 
second (not to mention the ambiguity in the conditional, i.e., an 
object could have a uniform probability of exactly 1/m during each 
of m intervals).  If we're looking for an exact answer, the precise 
interpretation of the "rate" is important.

The continuous constant-rate interpretation is easiest to solve,
and can be viewed as a linear Markov chain with n+1 states, where
the initial state has all n objects present, the next state has
n-1 objects present, and so on.  The transition from the state 
with k objects to the next state has a constant exponential rate 
of k(1/m), and therefore a mean transition time of m/k.  The mean
transition time of the initial state with k=n down to the empty 
state with k=0 is just the sum of the individual mean transition
times, which is

             m (1 + 1/2 + 1/3 + 1/4 + ... + 1/n)

The purely discrete interpretation leads to a somewhat more complicated 
problem (as usual).  Each individual object has a probability of 1.0 
of being present at the beginning of the first 1-second interval, and 
a probability of (m-1)/m of being present at the end of the first 
interval.  In general, each object has a probability of [(m-1)/m]^k 
of being present at the end of the kth interval, and so it's 
probability of NOT being present at the end of the kth interval 
is simply the complement of this value, i.e., 1 - [(m-1)/m]^k.

It follows that the probability of ALL n objects being gone at the end 
of the kth interval is the product of their individual probabilities of 
being gone, so it is given by {1 - [(m-1)/m]^k}^n.  Of course, this is 
the cumulative probability, which approaches 1.0 as k increases, but 
we need to know the probability of the system entering this "empty 
state" during the kth second.  Thus we need to subtract the probability 
of being in the empty state at the end of the (k-1)th state.  This 
gives us the probability of entering the "empty state" during the kth 
discrete 1-second interval:

 P{k} =  {1 - [(m-1)/m]^k}^n  -  {1 - [(m-1)/m]^(k-1)}^n

The mean number of intervals required to reach the empty state is 
then given by the weighted sum

                     inf
         k_mean  =  SUM  k P{k}
                     k=1

For small values of n we can just algebraically expand the expression 
for P{k} into a sum of powers of (m-1)/m, and then determine the 
closed-form expression for the infinite sum of powers.  However, for 
large values of n this becomes tedious, and it may be best to simply 
compute the sum directly - or approximate it by the continuous 
exponential solution given above.

Here's a little table of k_mean for n objects with 1/m chance of 
disappearing during each discrete 1-second interval:

                             m
       ---------------------------------------------
  n      1       2        3          4           5
 ---   -----  -------  -------  ----------  ----------
  1      1       2        3          4           5
  2      1      8/3     21/5       40/7        65/9
  3      1     22/7    477/95    1780/259    1595/183
  4      1   368/105  6963/1235 50128/6475 221405/22509

The difference between the continuous and discrete interpretations
can be seen by comparing the answers for 4 objects that each have 
a probability of 1/5 of disappearing during each second (on the
condition that they haven't already disappeared).  The solution
for the discrete interpretation is 

            k_mean = 221405/22509 = 9.836287 sec

whereas the solution for the continuous-rate interpretation is

  k_mean = 5 [ 1 + 1/2 + 1/3 + 1/4 ] =  125/12  = 10.416666 sec

so they differ by 0.58 seconds, nearly 6%.

Incidentally, we can simplify the formula

             inf
 k_mean  =  SUM   k [ (1 - r^k)^n  - (1 - r^(k-1))^n ]
             k=1

where r = (m-1)/m, by noting that the first few terms of the 
summation are

           1(1 - r^1)^n  -  1(1 - r^0)^n

         + 2(1 - r^2)^n  -  2(1 - r^1)^n

         + 3(1 - r^3)^n  -  3(1 - r^2)^n

         +         etc.

Taking advantage of the cancellation on consecutive terms, we can
express the sum of the first t terms as

                              t-1
            t(1 - r^t)^n  -  SUM  (1 - r^k)^n
                              k=1

Adding and subtracting t-1, this can be written as

                                t-1
      t(1 - r^t)^n  - (t-1) +  SUM  (1 - (1 - r^k)^n)
                                k=1

In the limit as t goes to infinity the two leading terms go to
1 (because r^t goes to zero), so we are left with the result

                        inf
      k_mean  =  1  +  SUM  (1 - (1 - r^k)^n)
                        k=1

Noting that the summand would be 1 with k=0, we can bring the
1 inside the summation and write

                        inf
          k_mean  =    SUM  (1 - (1 - r^k)^n)
                        k=0

This is confirmed by considering the simple case when n=1, for
which we know the mean survival time is just m.  In this case
the summand in the preceeding formula reduces to r^k, so by
the geometric series we have the sum 1/(1-r) = m.

In the same way we can easily derive simple closed-form expressions 
for small values of n.  For example, if n=2 the summand reduces
to 2r^k - (r^2)^k, and so the infinite sum is 2/(1-r) - 1/(1-r^2),
which equals m(3m-2)/(2m-1).  In general, for any fixed n, we have
the closed-form expression

                  C(n,1)       C(n,2)                 C(n,n)
  k_mean[n]  =   --------  -  -------- + ... -(-1)^n --------
                 1 - r^1      1 - r^2                 1 - r^n

where C(i,j) is the binomial coefficient.  For example, with n=5 we 
have
                  5       10        10         5         1      
  k_mean[5]  =  ----- - ------- + ------- - ------- + -------
                1 - r   1 - r^2   1 - r^3   1 - r^4   1 - r^5

In terms of m this can be written in the form

 k_mean[5] =

   /    10m      10m^2          5m^3               m^4            \
 m( 5 - ---- + --------- - -------------- + --------------------   )
   \    2m-1   3m^2-3m+1   4m^3-6m^2+4m-1   5m^4-10m^3+10m^2-5m+1 /

It's interesting that for sufficiently large m (meaning a LOW rate
of disappearance) we can neglect all but the leading term in each
denominator, and so the expression reduces to

             /     10     10     5     1  \
           m( 5 - ---- + ---- - --- + ---  )
             \      2      3     4     5  /

which, not surprisingly, is exactly equal to to solution for the 
case of a constant exponential rate

             /     1     1     1     1  \
           m( 1 + --- + --- + --- + ---  )
             \     2     3     4     5  /

Thus we're led to the nice little combinatorial identity

           n   1          n         -C(n,k)
         SUM  ---   =   SUM  (-1)^k --------
          k=1  k         k=1           k


Return to MathPages Main Menu