## Bernoulli Trials

```If the probability of an individual event occurring on a given trial
is P, then within a sequence of S trials what is the probability that
AT LEAST Y of these events will occur IN A ROW?

These are called Bernoulli trials.  Let Pr{k,q,n} denote the
probability of at least one "run" (i.e., string of consecutive
successes) of length n in a sequence of k trials, given that
the probability of success on each trial is q.  By applying the
inclusion-exclusion principle it's easy to show that the values
of Pr{k,q,n} satisfy the recurrence

Pr{k,q,n} = Pr{k-1,q,n} + (q^n)(1-q) [1 - Pr{k-n-1,q,n}]       (1)

with the initial values

Pr{j,q,n} = 0       for j = 0,1,..,n-1

Pr{j,q,n} = q^n     for j = n

Equation (1) is convenient for computing the values of Pr{k,q,n}
for k=n,n+1,...  recursively, but it also allows us to give an
explicit expression for Pr{k,q,n} in terms of k, q, and n.  For
this purpose it's convenient to deal with the probabilities of the
complementary event, i.e., the event of having NO run of length n.
For any given values of q and n, let S[k] denote the probability
1 - P{k,q,n}.  Substituting into equation (1) gives the simple
homogeneous recurrence relation

S[m]  =  S[m-1] - (q^n)(1-q) S[m-n-1]                 (2)

whose characteristic polynomial is

x^(n+1)  -  x^n  +  (q^n)(1-q)  =  0                (3)

Obviously x=q is a root.  Letting r_1, r_2,..,r_n denote the remaining
roots, the values of S[k] are given explicitly by

S[k]  =  C_1 (r_1)^k  +  C_2 (r_2)^k  +  ...  C_3 (r_n)^k

where the C_i are constant coefficients determined by the initial
conditions  S[0]=S[1]=..S[n-1]=1  and  S[n] = 1 - q^n.

For example, if n=2 and q=1/2, the probability that a sequence of k
trials will contain NO two consecutive successes is exactly
_           _                _           _
/ 5 + 3/5 \  / 1 + /5 \ k    / 5 - 3/5 \  / 1 - /5 \ k
S[k]  =  ( --------- )( -------- )  + ( --------- )( -------- )
\   10    /  \    4   /      \   10    /  \    4   /

Similar formulas can be given for any specified run length n and
success probability q.  However, it's usually more efficient to use
equation (1) or (2) to compute the values recursively.

The above derivation assumed the roots r_i of equation (3) are
distinct.  To prove this, note that a polynomial f(x) has a multiple
root r iff f(r)=f'(r)=0.  With f(x) = x^(n+1) - x^n + K  we have
f'(x) = (x^n)((n+1)x - n), so the only possible multiple roots would
be x=0 or x=n/(n+1), which requires q = 0, 1, or n/(n+1).  So, if q
is anything other than one of these three special values, the roots
of f(x) are distinct.  (Of course, f(x) always has a factor of (x-q),
so even in a case like n=2, q=2/3 when f(x) has a double root the
polynomial f(x)/(x-q) has distinct roots.)
```