Probabilities for Splitting Sets

If the integers 1 to 9 are randomly distributed into three sets of three integers, what is the probability that at least one of the sets contains only odd integers? (Sometimes this question is expressed in terms of the product of the numbers in a given set being odd, but of course this is true if and only if all the numbers in the set are odd.) We assume that the distribution of digits is uniform. The answer is not difficult to find, but the problem can be generalized, and the general method of solution has interesting similarities to the sum-of-all-paths method in quantum field theory.

Oddly enough, the correct numerical answer (5/14) has been given in various sources, but often with erroneous justification. Of course, we can simply examine all 9! = 362880 permutations of the 9 digits, and determine that 129600 of them have purely odd values for the first three, middle three, or last three digits. However, we would like a direct analytical derivation. One reference begins by noting that the number of distinct sets of three integers taken from the digits 1 to 9 (without duplication and without regard to order) is the binomial coefficient C(9,3) = 84, and the number of those containing only odd integers (i.e., chosen from the five digits 1, 3, 5, 7, 9) is C(5,3) = 10. Then the reference announces that “There are three groups to be formed, so the answer is 3(10/84) = 5/14”. This happens to yield the correct numerical value, but the rationale is unsatisfactory. It’s true that if we create a partition by first randomly selecting three digits, the probability that this initial triple consists entirely of odd integers is 10/84, but it is not valid to say that we repeat this three times to create an entire partition, thereby giving a total probability of 3(10/84). The triples are not statistically independent. One could justify the solution 3C(5,3)/C(9,3) for this specific problem by noting first that the number of distinct splittings of the 9 digits, without regard to order of the sets or the order of digits within each set, is C(9,3)C(6,3)C(3,3)/3!, and then noting that the number of such splittings with exactly one purely odd set is C(5,3)[C(6,3)C(3,3)/2!]. The ratio of these is indeed 3C(5,3)/C(9,3). Of course, to cite this as the answer to the original question, we rely on the fact that there can’t be more than one purely odd set. In more general problems this does not apply, so this solution method is somewhat ad hoc.

For a more general solution method, we first note that the integers from 1 to 9 contain five odd integers and four even integers. For convenience, let ijk denote a given “state” of three sets of up to three integers, where i, j, and k are the numbers of even integers in the three sets respectively. Since order doesn’t matter, we will always list these in monotonic decreasing order. For example, prior to placing any even integers in the sets, we have the state 000. After placing an even integer into one of the sets, we have the state 100 with probability 1. The next even integer could be placed in the set that already contains an even integer, in which case we would have the state 200. On the other hand, the second even integer could be placed in one of the previously empty sets, in which case we would have the state 110. The probabilities of these two transitions are 2/8 and 6/8 respectively, representing the fractions of the open slots that lead to the respective states. Continuing in this way, we can randomly place all four even numbers, with the state transitions and probabilities shown below.

This has obvious similarities to a Markov model. The probability of each state equals the sum of the probabilities of each predecessor state multiplied by the respective transition probability. (In a Markov model, the rate of change of the probability of each state would be given by this quantity.) For example, Pr{210} equals (6/7)Pr{200} plus (4/7)Pr{110}. Note that the sum of the probabilities on each level equals 1. The only state for which none of the sets contains only odd numbers is 211, and we can easily compute the probability of this state as the sum of the products of the transition rates along each possible path from 000 to 211. There are three such paths, listed below along with their probabilities:

The total probability of state 211 is the sum of these three contributions, which gives 36/56 = 9/14. The complement of this is the probability of states 310 and 220, for each of which there is one set with no even integers, meaning there is a set with all odd integers. Therefore, we’ve found that if the integers 1 to 9 are randomly distributed into three sets of three integers, the probability that at least one of the sets contains only odd integers is 5/14. This can also be computed directly as the sum of the probabilities of state 310, which is 2/14, and the probability of state 220, which is 3/14.

One way of formalizing this computation (which is useful for more complicated problems of this type, to be discussed later), is by means of matrices. We can encode each phase of the transitions described above in terms of a suitable matrix, and then simply multiply all the matrices together to give the probabilities of all possible outcomes. For example, in the simple problem above, the probabilities of states 300, 210, and 111 are given in terms of the probabilities of states 200 and 110 by the equations

In matrix form this can be written as

In general, the transition from a phase of n states to a phase of m states is represented by a mxn matrix, and the columns are unitary (i.e., sum to 1) with a single common denominator (equal to the total number of open spaces in the entry states of that phase). The product of all these matrices gives the desired result. Thus, for the simple problem described above, the complete calculation can be expressed as

For a less trivial example, instead of splitting the integers 1 to 9 into three sets of three, we will consider splitting the integers 1 to 16 into four sets of four. In other words, we pose the question: If the integers 1 to 16 are randomly distributed into four sets of four integers, what is the probability that at least one set contains only odd integers? For this problem each “state” can be represented by a four-digit number ijkm, where i, j, k, and m represent the number of even integers in the four sets respectively, listed in monotonic decreasing order. Thus the completely empty state is denoted as 0000, and after placement of an even integer into one of the sets we get the state 1000, and so on. Since there are eight even integers in the range from 1 to 16, we need to consider the distributions of eight even integers into the four sets. At the nth level, the number of distinct states is equal to the number of partitions of n into four parts (allowing 0) no greater than 4. The states at each of the levels for this problem are listed below.

The transitions from the states of one column to the states of the next column could be drawn, but the resulting drawing would be difficult to read, since the transition paths at the higher levels are more numerous and cross each other. However, using matrices we can specify the transitions between columns more easily. The first transition matrix is trivially T₀ = [16]/16, and the next three transitions are easily seen to be given by the matrices

The next two transitions are given by the matrices

Finally, the last two transitions are given by

Multiplying these together, we get the probabilities of each of the eight possible fully-populated states

Since each of the phase transitions is “unitary”, the sum of the probabilities is obviously 1. The states with at least one purely odd set are those with at least one “0” in their representations (meaning they have zero even integers in a set), i.e., the states 4400, 4310, 4220, and 3320. These have a combined probability of 329/2145, which is approximately 0.1533799... If we asked for the probability that exactly one set is purely odd, we would exclude state 4400 (which has two purely odd sets), so the answer would be 328/2145.

As an aside, note that the reasoning given in the reference mentioned previously would imply that the answer to this problem would be 4C(8,4)/C(16,4) = 330/2145 = 2/13. This differs from the correct answer by 1/2145, which demonstrates that the reasoning in that reference is not generally applicable, despite the fact that it happens to give the right numerical answer in the simple case involving just the digits 1 to 9. As explained above, that simple formula relied on the fact that, in the 1-9 problem, there was just a single purely odd state. In the 1-16 problem this is not true, because of state 4400. One might say the simple formula counts the case 4400 twice, since this case has two purely odd sets. Recall that (in the simplistic formula) the numerator represents one purely odd set combined with three non-odd sets whose number is divided by 3! since the order of the non-odd sets doesn’t matter. But for the 4400 state we must divide by another factor of 2, to account for the fact that we can swap the explicit purely odd set with the purely odd set among the group of three supposedly non-odd sets. To use this “simple” method to solve the problem we would need to justify this reasoning and independently determine the probability of state 4400. Hence the “simple” method would not be so simple. In such cases, the full solution method (e.g., using the transition matrices) described above seems preferable, especially since it provides complete knowledge of the individual probabilities of each individual state.

For another example, suppose the integers 1 to 12 are randomly split into four sets of three integers. What is the probability that at least one of the sets will be purely odd? Just as before, the initial empty state is represented by 0000, and the five possible fully populated states (after distributing the six even numbers among the four sets) are 3300, 3210, 3111, 2220, and 2211. The initial transition matrix is trivially T₀ = [12]/12 and the next three transition matrices are

The two remaining transition matrices are

Based on these transitions, the probabilities of the five fully populated states are given by

For this problem the states with one or more purely odd set(s) are 3300, 3210, and 2220, so the probability is (1+36+18)/154 = 45/154.

An alternative way of formalizing problems of this kind is to represent each state by the valence counts. If each set contains room for N elements, we can represent each state by an N+1 valence counts, signifying the numbers of sets containing 0, 1, 2, ..., N even integers, respectively. For example, the state 3300 contains two 3’s and two 0’s, so we could represent this state as [2,0,0,2,0]. The format is convenient because the rules for generating the successor states and the associated probabilities are easy to express and implement in these terms. The state with all four sets empty is represented by [4,0,0,0,0], and thereafter the contributions to the successor states of any state [v_0,v_1,v_2,v_3,v₄] are given uniquely by deducting 1 from each non-zero valence index (except for the highest) and adding it to the next higher valence index.

To illustrate, consider again the original example of splitting the integers from 1 to 9 into 3 sets of 3. In terms of valencies, the states [v_0,v_1,v_2,v₃] are as shown below, where the kth column represents the valency states after the kth even number has been placed, although now we extrapolate to allow filling all nine with “even numbers”, as shown by the paths shown below.

These could just as well be the valencies for arbitrary widgets. The state 3000 on the left represents the state in which all three sets contain zero widgets, whereas the state 0003 on the right represents the state in which each of the three sets contains 3 widgets. The state with all three sets empty can be denoted as [3,0,0,0], and thereafter the contributions to the successor states of any state [v_0,v_1,v_2,v₃] are given uniquely by deducting 1 from each non-zero valence index (except for the highest) and adding it to the next higher valence index. For example, if v₂ is non-zero, the contribution from this state to the successor state [v₀, v₁, v₂−1, v₃+1] is

where

To make the calculation of the probabilities explicit, we can write Pr{[v_0,v_1,v_2,v₃]} as the sum of the contributions of its predecessor states

excluding the cases with negative indices, by the equation

Thus we have

Also we have

For one more example, going past the bounds of the “even numbers”, to distribute arbitrarily many widgets, we have

In this way we can determine all the transition factors and compute the probabilities for the distributions. In fact, we can continue to populate the subsets to complete the distribution of widgets, until every subset is filled. The contribution factors and configuration probabilities are shown below.

Naturally the probabilities are symmetrical, because filling from left to right is equivalent to removing from right to left. It’s interesting that although the splitting probabilities in the two directions are quite different, they yield exactly the same pattern of cell probabilities.

For another example, consider the integers 1 to 25, and suppose we split these into five sets of five integers. In this case we can write Pr{[v_0,v_1,v_2,v_3,v₄, v₅]} as the sum of the contributions of its predecessor states

excluding the cases with negative indices, by the equation

where (again) it is understood that terms with negative indices are excluded. Using this approach, the scaled probabilities for each of the 20 fully populated states (after randomly distributing 12 even integers into the five sets) are

The states with all odd integers are those with no even number in at least one set, so they are the states that have v₀ > 0. Adding up the probabilities of those states, we find that the probability of at least one purely odd set is 12506/104006, which is approximately 0.120243...

One of the interesting aspects of this class of problems is that it resembles in some respects the behavior of quantum fields. Recall that, according to Feynman’s approach to quantum electrodynamics, the quantum amplitude for a particle to go from event A to event B is equal to the sum of the products of the probabilities along each possible path from A to B (with a coupling constant for each multiplication). In our analysis above we found that the probability of reaching a given final state from the initial state is the sum of the products of the transition probabilities along each possible path from the initial state to the final state. Also, the fact that adding entities from left to right is formally identical to adding holes (i.e., removing entities) from right to left is reminiscent of Dirac’s original “hole” conception of anti-particles.

Return to MathPages Main Menu