Hierarchical Models and Cascading Inferences 

Falstaff: My lord, the man I know. 
Prince Henry: I know thou dost. 
Shakespeare 

When entering a conference of logicians, each participant is given a card stuck to his forehead (which he can’t see). Each is told that the card is either white or gray, and is asked not to reveal anything about the color of anyone’s card to any other participant. All the cards are gray. As the participants arrive, they enter a conference hall, where they mill around, meeting and greeting each other. The main presentation screen shows a large digital clock, and instructs each participant to be seated the next time the minute digits change if at that time he can logically conclude that his card is gray. After all 35 participants have arrived, they continue milling around and socializing for over an hour. Then an announcement is made, stating that at least one of the cards is gray. The participants continue milling around for another 35 minutes, and at which time they all take their seats. 

This sequence of events may seem surprising at first, because evidently the announcement that at least one card is gray has triggered some (albeit delayed) logical inferences, and yet, even before the announcement, each participant can obviously see for himself that all 34 of the other cards are gray. Thus it might seem that the announcement has not imparted any new information, so it can’t have any effect. 

To understand what’s happening, it’s useful to begin by considering cases with fewer participants. Obviously if there were just a single logician at the conference, he wouldn’t know that he has a gray card, so he would never sit down, until the announcement is made that there is at least one gray card. At that point he can deduce that his card must be gray, so he will sit at the first opportunity. 

For a less trivial example, suppose there are two logicians at the conference. Each one sees that the other has a gray card, so they each know there is at least one gray card, but they don’t know that the other participant knows this. Without an announcement, they would never sit down, because they can continue indefinitely to believe it’s possible that they themselves have a white card. However, once the announcement is made that there is at least one gray card, each logician knows that if his own card is white, the other logician must sit at the next opportunity, because that other logician will be forced to conclude that his own card must be gray. It follows that, after the first minute when neither logician sits down, each one must conclude that his own card is gray (because if it was white, the other logician would have sat down at the first minute), and therefore both must sit down at the second minute. 

Now consider a conference of three logicians. Each one sees that the other two have gray cards, and he can (initially) hypothesize that he himself has a white card. On the basis of this hypothesis, the other two logicians would see that his card is white, so he would be irrelevant to their deliberations. Those two would proceed just as in the preceding case, i.e., they would remain standing at the first minute (after the announcement), then infer that they have gray cards from that fact that neither of them sat down, and hence they would both sit at the second minute. However, if they do not sit down at the second minute, the third logician must conclude that his hypothesis is wrong, and that in fact he has a gray card. The situation is entirely symmetrical, so all three of them would reason the same way, and hence they all three would sit down at the third minute. 

In general, knowing that a group of N logicians (all with gray cards) will all sit at the Nth minute after the announcement, it follows that a group of N+1 logicians (all with gray cards) will all sit at the (N+1)th minute following the announcement. As noted above, this often strikes people as paradoxical, because in a conference of many logicians, not only do they each know (prior to the announcement) that all others have gray cards, they also know that everyone else knows that there are gray cards (because they can each see many of the same people), and everyone knows that everyone knows that there are gray cards, and so on. Thus, letting A, B, C, ... denote the conference participants, we can say 

 A knows there is at least one gray card. 
 A knows that B knows there is at least one gray card. 
 A knows that B knows that C knows there is at least one gray card. 
etc. 

For example, in a conference of just five logicians, A, B, C, D, and E, it’s clear that A knows there are at least four gray cards, because he can see that B, C, D and E are gray. Also, A knows that B knows there are at least three gray cards, because A and B can both see that C, D, and E are gray. Furthermore, A knows that B knows that C knows there are at least two gray cards, because those three can all see that D and E are gray. Finally, A knows that B knows that C knows that D knows that there is at least one gray card, because those four can all see that E is gray. However, there is no card visible to all five participants, so A does not know that B knows that C knows that D knows that E knows that there are any gray cards. 

This is why, prior to the announcement, there is no chain of logical implication by which the participants can definitely infer anything from the fact that no one has taken a seat. Participant A can hypothesize that his card is white. Also, A knows that B cannot see his own card, so A knows that B could hypothesize that B’s card is white as well. Therefore, from A’s perspective, it is possible that B thinks that A and B are white, and of course A knows that B knows that C cannot see his own card, so C could hypothesize that his card is white also. Therefore A can think that B can think that C can think that their cards are all white. Extending this reasoning, we see that A can think B can think C can think D can think E can think all their cards are white. 

Under this hierarchical chain of embedded hypotheses, E actually sees that A, B, C, and D have white cards, and hypothesizes that his own card is white. However, as soon as the public announcement is made, stating that at least one card is gray, E is forced to conclude (according to this chain of embedded hypotheses) that his hypothesis is wrong, and his card must be gray, so he will sit down at the next minute. If he does not sit, participant D, who hypothetically sees A, B, and C with white cards, is forced to conclude that his hypothesis is wrong (because E must see a gray card, and D sees A,B,C with white cards), so his card must be gray, and hence he must sit down at the next minute. If he doesn’t sit, then the hypothetical C must conclude that his hypothesis is wrong, and his card must be gray, so he will sit down at the next minute. And so on. Thus if no one sits down for the first four minutes, then A must conclude that his hypothesis is wrong, and his card must be gray, so he must sit down at the 5^{th} minute. Each of the other four participants reasons similarly, so they each sit down at the 5^{th} minute. The same kind of reasoning applies to any number of participants. 

This is an interesting scenario because each participant must construct a hierarchical sequence of embedded models of the other participants. To formalize this, we could let A denote a Boolean variable signifying that logician A can believe his card is white, and we can let A{B} denote another Boolean variable signifying that A can believe B can believe that B’s card is white. Likewise A{B{C}} signifies A can believe B can believe C can believe that C’s card is white, and so on. In each of these nested models, the initial hypothesis is that each modeled participant in the string actually sees all the higherlevel modeled participants as having white cards, and hypothesizes that his own card is white. With N participants, each one can construct (N−1)! such nested models of the entire population. For example, with N=4, the participant denoted by A can consider 3! = 6 Boolean variables of the form A{B{C{D}}}, A{B{D{C}}}, A{C{B{D}}}, A{C{D{B}}}, A{D{B{C}}}, and A{D{C{B}}}. Prior to the announcement, all six of these are true, but after the announcement, all six of them are false, because A knows that the last modeled individual in each chain cannot think their card is white while also seeing that all the higherlevel cards are white. Therefore, the last individual must either sit down at this point, or else the model of the individual at the next higher lever in the sequence must conclude that his own card is gray. For example, the variable A{B{C{D}}} after the announcement becomes false, and then after D does not sit down it becomes true again but the variable A{B{C}} becomes false, and then after C does not sit down the variable A{B} becomes false, and then after B does not sit down the variable A becomes false, at which point A sits down (as do all the other participants, based on similar reasoning). 

Prior to the announcement, the hierarchical models that A has constructed (with N = 4) can be depicted as shown below, where each region represents a model of the participant embedded within the model of another participant, and so on, up to participant A. (Similar model can be drawn for each of the other participants.) 



The figure below shows the status of the models after the announcement (state 1), after the first minute when no one sits (state 2), after the second minute when no one sits (state 3), and after the third minute when no one sits (state 3). 



In state 4, participant A cannot believe his card is white, so he must sit at the 4^{th} minute. Likewise the other participants conclude the same thing at this stage, so they all sit. Notice that in state 1 (after the announcement) the lowest level models are colored gray based on the higher level hypotheses, but when they don’t sit at the first minute, the second lowest level can believe that the lowest level believes it has a white card (so it turns white again in our figure), at the expense of invalidating its hypothesis about its own card. Hence the second lowest level turns gray. At the next minute the models are revised again, and the gray coloring moves up another level, and so on, until finally A himself must conclude that his card is gray (as do all the other participants). At this point all the participants sit, so all the regions turn gray. 

This logical puzzle has been presented in many different forms. Perhaps the most common is in terms of blueeyed islanders. The story is that a volcanic island contains 500 browneyed natives, 400 greeneyed natives, and 100 blueeyed natives. The native religion prohibits anyone from knowing their eye color, or from talking about the eye color of anyone else, or from looking at their reflection. According the strict rules, if anyone ever discovers their own eye color, they must fling themselves into the volcano at midnight of that night. Life goes on fine for many years, but then one day a foreign visitor gives a speech to the assembled natives, and lets slip the fact that at least one of the natives has blue eyes. One hundred days later, at midnight, all the blueeyed natives fling themselves into the volcano. 

At a conference of 1000 cosmologists, the participants would presumably all take their seats immediately, because, by the Cosmological Principle, every participant would assume the world looks the same to everyone, and since everyone he sees has a gray card, he assumes the same is true for what everyone else sees. On the other hand, a conference of 1000 probability theorists would present special problems, because each Bayesian might infer that the probability of being the only white card (assuming he is not a special person) is extremely low, so he might sit immediately, whereas the frequentists might dispute the validity of this reasoning in a unique situation, and refuse to ever take their seats. 

More seriously, it’s interesting to consider the discrete nature of the decision process in the original example. Each logician was required to make a determination once per minute about whether he knows, based on the information available up to that time, if he has a gray card. With N participants this means that all the participants would sit down N minutes after the announcement is made. But if we had specified that each logician is required to make a determination once per nanosecond, then all would sit down after only N nanoseconds. This is not realistic, because human brains cannot think – or sit – so quickly. Still, we could change to a scenario involving highspeed computers, and instead of sitting down, the computers simply broadcast a signal when they have determined the “color of their card”. This would make possible a very highspeed version of this game, but there would still be limits. 

If, in the original logician’s conference, we had eliminated the clock and the discrete time intervals, and simply instructed the participants to sit if and when they can conclude that their card must be gray. This is problematic, because even if we assume each logician’s logic is perfect, we don’t know how quickly each one will carry out and update his assessment as time goes by and no one sits down. Each participant also knows that the other participants don’t know how much time the other participants are taking, so this seems to create a perpetual delay. Each participant needs to wait the longest possible time to give the slowest of the other participants time to respond, and none of them can be sure that they themselves are the slowest, so they must always wait for someone else to sit, and hence none of them will ever sit. 

There are some interesting parallels between these hierarchical logical models and the traditional formalism of quantum mechanics. We might compare the initial models of each participant with the wave function of observers in quantum mechanics. We may have a hierarchy of observers, such as a scientist inside the sealed capsule with Schrodinger’s cat, and another scientists outside the capsule in the laboratory, and another scientists outside the laboratory, and so on. When discrete observations are made, it seems as if the wave function collapses onto an eigen vector. One of the puzzling aspects of thought experiments like Schrodinger’s cat is that they seem to imply a multiplicity of realities at lower levels, somewhat similar to the multiplicity of logical models of the participants in our conference. The collapse of all the hypothetical models is triggered by the introduction of just a single bit of “common knowledge”, namely, the fact that at least one card is gray, that extends over all the levels. In the absence of that bit of common knowledge, no participant that can be seen by everyone. Given N participants, there are 2^{N} subsets (including the empty set), and the larger the subset, the fewer people can be seen by every member of the subset. 
