Discrete Circular Distributions |
|
Consider a circular arrangement of D bins, into which are placed n marbles randomly with uniform distribution. What is the probability that all the marbles will be located within N consecutive bins? If N is less than or equal to D/2, the answer is easy to see, as discussed in another note. Briefly, the question is equivalent to whether n randomly placed arc spans of length N all have some common overlap, and we can consider how many of the Dn different arrangements of n such arcs intersect in at least one bin. Focusing on just one particular bin, we know there are Nn distinct arrangements for which every arc includes that bin. Multiplying this by D, the number of bins, gives a first estimate of the total number of arrangements with at least one bin in common. Each arrangement with exactly one bin in common has been counted exactly once in this estimate, but each arrangement with two bins in common has been counted twice (once for each of the common bins). Likewise each arrangement with three common bins has been counted three times, and so on. To eliminate the duplicates, consider the number of arrangements that have (at least) two particular adjacent bins in common. The number of such arrangements for any particular pair of adjacent bins is clearly (N−1)n, and we can again multiply this by D to give an estimate of the total number of such arrangements. This counts the arrangements with exactly two common bins just once, but it counts those with three common bins twice, and so on. Therefore, subtracting this number from the previous number, and dividing the result by Dn (the total number of all possible arrangements) gives the probability of overlap between all n of the arcs |
|
|
|
But, again, this applies only if N is less than or equal to D/2. The question is how to generalize this for cases when N is greater than D/2. The solution for continuous circular arcs is well known, but the discrete case seems to be less so. At one extreme, with N = D−1, the answer is easy to see. Letting cj denote the event that none of the marbles is located in the jth cell (bin), the event of interest is the union of these n events, so the probability of all n marbles being located in an arc of length N = D−1 is |
|
|
|
Hence the probability is |
|
|
|
In the case N = D – 2 we need to assess the probability of the event |
|
|
|
The first-order terms are all products of two distinct elements, and there are D such terms, so they contribute D[(D−2)/D]n to the probability, but the second-order terms are more complicated, because the first-order terms are not all mutually independent. There are a total of D(D−1)/2 second-order terms, and of these, D consist of products of three distinct elements, and D(D−3)/2 consist of products of four distinct elements. Proceeding to the third order terms, there are a total of D(D−1)(D−2)/6, consisting of D products of four, D(D−4) products of five, and D(D−4)(D−5)/6 products of six distinct elements. We can continue to find higher order terms, until all combinations have been exhausted, and then consolidate terms, noting that the kth order terms consisting of j products contribute |
|
|
|
to the total probability. The same approach can be taken for any N, by setting the D basic elements to the D products of D-N consecutive elements ci. As an example, suppose we wish to determine the formula for the probability of all n marbles lying within a span of N=12 cells in a circle of D=16 cells. We seek the probability of the event |
|
|
|
Applying inclusion-exclusion, we get the following numbers of kth order products of j basic elements: |
|
|
|
Naturally, the sum of the terms on the kth row is the binomial coefficient C(16,k). From this table we can infer that, given a circle of 16 cells, the probability of all n marbles lying within 12 consecutive cells is |
|
|
|
In the same way, we can determine the probabilities with smaller values of D but the same ratio of N/D. This gives |
|
|
|
To see the correspondence with the continuous solution, consider the first terms of these expressions, which are multiples of quantities of the form |
|
|
|
where we have put q = N/D and α = 1/D, and we have expanded the expression in powers of α. The coefficient of each of these quantities is simply D = 1/α, so in the limit as D goes to infinity (and a goes to zero) these terms represent simply the quantity nqn−1. The second terms are multiples of quantities of the form |
|
|
|
At this state it would be tempting to think that the coefficients of these terms are |
|
|
|
but in fact to give agreement with the preceding results we must subtract D/2, and we can clearly add or subtract any constant multiple of D/2 without affecting the limiting value, since the ratio of (2N−D−K)/(2N−D) for any constant K still goes to 1 as N and D increase to infinity. Thus the actual coefficients of the second terms are given by |
|
|
|
The third terms (if present) are multiples of quantities of the form |
|
|
|
Again the temptation is to assume the coefficients of these terms are the reciprocal of the leading factor in this expression, i.e., |
|
|
|
But again, in order to agree with the coefficients derived by inclusion-exclusion, we must deduct a constant from each of the factors in the square. In this case we subtract 1 from the first and 2 from the second, giving the coefficients of the third terms as |
|
|
|
The pattern for higher order coefficients is clear, and of course for any given N and D we will have only k non-zero terms, where kN−(k−1)D must be greater than or equal to k. Hence the number of terms is the greatest integer less than or equal to D/(D−N+1). This explains why, if N is less than or equal to D/2, there is only one term (as we discussed originally). With these coefficients, the continuous case for n points on a circle all lying within an arc of q times the whole perimeter is given by the well-known formula |
|
|
|
where jmax is the greatest integer less than but not equal to 1/(1−q). In our previous example we had N/D = q = 3/4, so jmax = 3, and this formula gives |
|
|
|
which agrees exactly with the result of taking the limit of the discrete cases with N/D = 3/4 as D goes to infinity. |
|
|
|
where jmax is the greatest integer less than or equal to D/(D−N+1). Letting M denote D−N, this can be written more succinctly as |
|
|
|
To illustrate the application of this formula, suppose there is a circular arrangement of D = 26 cells, and n marbles are distributed randomly (with uniform density) into these cells. To determine the probability that all n marbles will lie within a span of N = 19 cells, we have jmax = 3, because this is the greatest integer less than or equal to 26/(26−19+1). We also have M = 26−19 = 7, so the probability can be expressed as |
|
|
|
A plot of this function is shown below. |
|
|
|
As we would expect, the probability is 1 for n = 1, 2, or 3, because any three points on a circular set of 26 cells lie within a span of 19. |
|