Discrete Circular Distributions

Discrete Circular Distributions

Consider a circular arrangement of D bins, into which are placed n marbles randomly with uniform distribution. What is the probability that all the marbles will be located within N consecutive bins? If N is less than or equal to D/2, the answer is easy to see, as discussed in another note. Briefly, the question is equivalent to whether n randomly placed arc spans of length N all have some common overlap, and we can consider how many of the Dⁿ different arrangements of n such arcs intersect in at least one bin. Focusing on just one particular bin, we know there are Nⁿ distinct arrangements for which every arc includes that bin. Multiplying this by D, the number of bins, gives a first estimate of the total number of arrangements with at least one bin in common. Each arrangement with exactly one bin in common has been counted exactly once in this estimate, but each arrangement with two bins in common has been counted twice (once for each of the common bins). Likewise each arrangement with three common bins has been counted three times, and so on. To eliminate the duplicates, consider the number of arrangements that have (at least) two particular adjacent bins in common. The number of such arrangements for any particular pair of adjacent bins is clearly (N−1)ⁿ, and we can again multiply this by D to give an estimate of the total number of such arrangements. This counts the arrangements with exactly two common bins just once, but it counts those with three common bins twice, and so on. Therefore, subtracting this number from the previous number, and dividing the result by Dⁿ (the total number of all possible arrangements) gives the probability of overlap between all n of the arcs

But, again, this applies only if N is less than or equal to D/2. The question is how to generalize this for cases when N is greater than D/2. The solution for continuous circular arcs is well known, but the discrete case seems to be less so. At one extreme, with N = D−1, the answer is easy to see. Letting c_j denote the event that none of the marbles is located in the jth cell (bin), the event of interest is the union of these n events, so the probability of all n marbles being located in an arc of length N = D−1 is

Hence the probability is

In the case N = D – 2 we need to assess the probability of the event

The first-order terms are all products of two distinct elements, and there are D such terms, so they contribute D[(D−2)/D]ⁿ to the probability, but the second-order terms are more complicated, because the first-order terms are not all mutually independent. There are a total of D(D−1)/2 second-order terms, and of these, D consist of products of three distinct elements, and D(D−3)/2 consist of products of four distinct elements. Proceeding to the third order terms, there are a total of D(D−1)(D−2)/6, consisting of D products of four, D(D−4) products of five, and D(D−4)(D−5)/6 products of six distinct elements. We can continue to find higher order terms, until all combinations have been exhausted, and then consolidate terms, noting that the kth order terms consisting of j products contribute

to the total probability. The same approach can be taken for any N, by setting the D basic elements to the D products of D-N consecutive elements c_i. As an example, suppose we wish to determine the formula for the probability of all n marbles lying within a span of N=12 cells in a circle of D=16 cells. We seek the probability of the event

Applying inclusion-exclusion, we get the following numbers of kth order products of j basic elements:

Naturally, the sum of the terms on the kth row is the binomial coefficient C(16,k). From this table we can infer that, given a circle of 16 cells, the probability of all n marbles lying within 12 consecutive cells is

In the same way, we can determine the probabilities with smaller values of D but the same ratio of N/D. This gives

To see the correspondence with the continuous solution, consider the first terms of these expressions, which are multiples of quantities of the form

where we have put q = N/D and α = 1/D, and we have expanded the expression in powers of α. The coefficient of each of these quantities is simply D = 1/α, so in the limit as D goes to infinity (and a goes to zero) these terms represent simply the quantity nqⁿ⁻¹. The second terms are multiples of quantities of the form

At this state it would be tempting to think that the coefficients of these terms are

but in fact to give agreement with the preceding results we must subtract D/2, and we can clearly add or subtract any constant multiple of D/2 without affecting the limiting value, since the ratio of (2N−D−K)/(2N−D) for any constant K still goes to 1 as N and D increase to infinity. Thus the actual coefficients of the second terms are given by

The third terms (if present) are multiples of quantities of the form

Again the temptation is to assume the coefficients of these terms are the reciprocal of the leading factor in this expression, i.e.,

But again, in order to agree with the coefficients derived by inclusion-exclusion, we must deduct a constant from each of the factors in the square. In this case we subtract 1 from the first and 2 from the second, giving the coefficients of the third terms as

The pattern for higher order coefficients is clear, and of course for any given N and D we will have only k non-zero terms, where kN−(k−1)D must be greater than or equal to k. Hence the number of terms is the greatest integer less than or equal to D/(D−N+1). This explains why, if N is less than or equal to D/2, there is only one term (as we discussed originally). With these coefficients, the continuous case for n points on a circle all lying within an arc of q times the whole perimeter is given by the well-known formula

where j_max is the greatest integer less than but not equal to 1/(1−q). In our previous example we had N/D = q = 3/4, so j_max = 3, and this formula gives

which agrees exactly with the result of taking the limit of the discrete cases with N/D = 3/4 as D goes to infinity.

where j_max is the greatest integer less than or equal to D/(D−N+1). Letting M denote D−N, this can be written more succinctly as

To illustrate the application of this formula, suppose there is a circular arrangement of D = 26 cells, and n marbles are distributed randomly (with uniform density) into these cells. To determine the probability that all n marbles will lie within a span of N = 19 cells, we have j_max = 3, because this is the greatest integer less than or equal to 26/(26−19+1). We also have M = 26−19 = 7, so the probability can be expressed as

A plot of this function is shown below.

As we would expect, the probability is 1 for n = 1, 2, or 3, because any three points on a circular set of 26 cells lie within a span of 19.

Return to MathPages Main Menu