Infinite Descent versus Induction |
|
It is sometimes said that the principle of "infinite descent", developed by Fermat in his study of Diophantine analysis, is equivalent to induction. In assessing this claim we first need to understand what kind of "equivalence" is being claimed. Typically those who claim such an equivalence will support it by re-writing a proof by descent in the form of a proof by induction, by means of some set of logical connectives and the other primitive axioms of Peano arithmetic. However, it's obvious that since Peano's axioms for the natural numbers consist of nothing other than |
|
(i) the naturals have a first member |
(ii) each member has a unique successor |
(iii) there are no loops in the chain of successors |
(iv) mathematical induction |
|
the entirety of Peano arithmetic, including every theorem of Peano arithmetic, can be derived from these axioms. Would it be suitable to replace (iv) with "infinite descent"? If the two are completely equivalent, then the answer would have to be yes, but I would argue that the real answer is no. It is true that the axiom of induction is implicit in the principle of infinite descent, but it’s just as true that the other three axioms are implicit in the principle of infinite descent. This means that infinite descent is of a higher order than these basic axioms, and is constructed by invoking a combination of them. From this point of view, it makes no more (and no less) sense to say that infinite descent is equivalent to induction than to say that infinite descent is equivalent to the unique successor axiom, or to the first member axiom. All of these axioms (plus the tacit axioms of logic) are implicated in the principle of infinite descent, but no one of them individually can rightly be said to be "equivalent" to infinite descent. |
|
Those who argue that infinite descent is equivalent to induction typically refer to a proof such as the following for the fact that the sum of the first n integers is n(n+1)/2: |
|
If not, there is a least integer, say N, for which this is false. N ≠ 0, since 0 = 0(0+1)/2. Since N−1 < N, we must have the sum of the first N−1 integers is (N−1)N/2 and adding N to both sides and doing obvious arithmetic, we reach the contradiction that the sum of the first N numbers is N(N+1)/2. |
|
However, this example is actually a proof by induction, not a proof by infinite descent. The proof supposes the existence of a least integer N such that the sum from 1 to N is not equal to N(N+1)/2. So far so good. A proof by infinite descent would then show that there is an absolutely smaller number with the same property. This is the key distinction: the number must have a smaller magnitude than N, not just precede N in the ordered sequence of numbers. |
|
Pure induction is a chain of implications on the truth values of a given ordered sequence of propositions, and the induction is "activated" by citing the truth value(s) of one (or more) particular propositions (such as the case N=0 in the above example), from which the others follow. Notice that the actual magnitudes of the indices of the sequence are irrelevant; all that matters is their discrete order. (For a related discussion, see Representing Sets of Pure Order). |
|
In contrast, a proof by infinite descent operates essentially on the magnitudes of the indices of an ordered sequence of propositions, and shows that the indices cannot all differ by more than any fixed finite magnitude. This is a subtle but important concept, analogous to the different kinds of justifications that can be given for the fact that the slope of y = x2 is 2x. We know that for some small non-zero increment dx the slope near any given x is approximately equal to [(x+dx)2 − x2]/dx, which is 2x + dx. Now, on one level it seems that we can just set dx equal to 0, and we're done. However, the key step in our derivation involved a division by dx, so we really need to invoke a rigorous "infinite descent" argument leading to the concept of a limit for a sequence of values of dx approaching nearer to zero than any fixed finite magnitude. Thus, there is some validity in the claim that Fermat's infinite descent concept is one of the proto-types for reasoning about infinitesimals and limiting sequences, which is perhaps not surprising in view of Fermat's own contributions to the foundation of the differential calculus. |
|
To illustrate more clearly the distinction between pure induction and infinite descent, let's define the cumulative sum S(k) by the recurrence S(k) = S(k-1) + k with S(0) = 0, and suppose that S(N) = N(N+1)/2 + D for some non-zero integer D. The question is, can we necessarily construct a number with the same property but a smaller absolute value? There may be a way, but we can't simply use the number N−1, because this doesn't ensure a smaller magnitude, i.e., we can continue to subtract 1 indefinitely (into negative values), and the numbers at some point become larger in magnitude, even though they are lesser in value. Thus, we may be able to use a chain of implication from N to N−1 as part of an induction argument, but not an infinite descent (at least not without some other constraint on the indices). |
|
Notice that if N has the above property then simple subtraction gives S(N-1) = S(N) − N = (N−1)N/2 + D, so N−1 has the same property. Therefore, if we can find any integer k such that S(k) = k(k+1)/2 we can assert that D = 0 for all integers (positive and negative.) However, this is not a proof by infinite descent, it is a proof by induction from the particular value of k for which we know that S(k) is of the form k(k+1)/2. |
|
In contrast, consider a proof by infinite descent that √2 is irrational. We suppose there exist a pair of positive integers n0,n1 such that 1 + (n1/n0) = √2. Now consider the integer n2 = n0 − 2n1. It's easy to verify that 1 + (n2/n1) = √2, from which it follows that n2 = (√2−1) n1. Now, we have the inequality 0 < (√2−1) < 1, from which it follows that 0 < n2 < n1. Repeating this procedure, we have n3 = n2 − 2n1, and so on. Thus, we have an infinite sequence of positive integers n0, n1, n2,... of strictly decreasing magnitude, which is impossible because there are only a finite number of positive integers less than any given integer. This is an example of a genuine proof by infinite descent. |
|
Much of the confusion in this area arises from a failure to clearly distinguish between descent and infinite descent. We often see classifications of various "induction hypotheses" allowing us to assert that the proposition P(n) is true for all natural numbers n if P(0) is true and any of these conditions are satisfied for every k: |
|
WEAK P(0) and ( P(k−1) → P(k) ) |
STRONG P(0) and P(1) and ... and P(k−1) → P(k) |
DESCENT -P(0) or -P(1) or ... or -P(k−1) ← -P(k) |
|
Here the notation -P signifies “not P”. Whether it is appropriate to say that these conditions are "equivalent" depends on precisely what kind of equivalence we mean, because they also involve logical connectives and other common notions, but in any case this definition of descent is certainly close to pure induction, since it is simply a contrapositive formulation of strong induction (i.e. Q → P is equivalent to -Q ← -P, its contrapositive). |
|
However, the above definition of descent is quite distinct from the principle of infinite descent. Here we see a reason for much of the confusion in this area, because people often fail to recognize this distinction (especially since the term "descent" is often used as shorthand for "infinite descent"). |
|
It's important to keep in mind that the essential distinction between "induction" and "infinite descent" is not whether the argument proceeds in an "upward" or "downward" direction, nor whether it's applied in a positive or negative logical sense. The essential distinction is that induction relies on a principle of discrete order and completeness, whereas "infinite descent" represents a higher-order principle of discrete absolute magnitude combined with something very much like the pigeon-hole principle. |
|
The above schema for "descent" clearly doesn't embody the principle of "infinite descent", because it simply asserts that if the integer n does not have a specific property P, then at least one of the integers 0 to n-1 must also not have the property P. From this it follows that 0 must not have the property. Thus, if we happen to know that 0 does have the property, we have a contradiction. This is indeed equivalent to induction, but it isn't an "infinite descent" argument. |
|
A true argument by "infinite descent" doesn't assert something about the set of numbers 0 to n−1, it asserts the existence of an integer that is both non-zero and smaller in magnitude than n. Specifically, it asserts that if an integer n has the property P, then there exists a non-zero integer of magnitude less than |n| with the same property. This implies an infinite sequence of strictly decreasing integer magnitudes beginning with |n|, which is impossible. Notice that the argument does not depend on knowledge of the P-status of any particular number. Also, notice that we can apply this form of argument in any context where every element has a non-negative "norm" (i.e., magnitude), and the difference between any two distinct magnitudes is greater than some fixed constant. |
|
To express the essence of induction and infinite descent a little more precisely, let's first consider the classical notion of mathematical induction in a context slightly beyond the naturals, namely, the integers. An inductive argument relies on the fact that we can arrange the elements of some infinite set in a definite ordered linear sequence, one that can be indexed with the integers. Thus we have (in general) a "doubly infinite" ordered sequence of elements |
|
..., e[-2], e[-1], e[0], e[1], e[2], ... |
|
and we can assert that this sequence contains all the elements e[] of the set in question. Now, induction says that if the possession of a specific property P by the element e[j] for any arbitrary integer j implies that e[j+1] also has the property P, and if we can establish for some specific integer k that e[k] possesses the property P, then every element e[j] for j greater than or equal to k possesses the property P. |
|
Needless to say, we can replaces e[j+1] with e[j-1], and replace "greater than" with "less than" in the above proposition, and the resulting proposition also holds. Thus, we can "ascend" or "descend" with this argument. Furthermore, we can also logically negate either of these propositions (using De Morgan's rule) to arrive at equivalent true statements. However, in all cases, the statements produced in this way lead us to a definite conclusion only with the addition of knowledge of the P-status of at least one particular element. Thus, the essential technique of mathematical induction consists of establishing the P-status of one or more element(s) e[k] by prior means, and then invoking some general relation between the P-status of one element and the P-status of some others in the ordered sequence of elements. Notice that we haven't used any notion of the relative "magnitudes" of the elements e[] (they may not even have magnitudes). We have simply used the ordering and completeness of the sequence. |
|
Now let me describe my conception of "infinite descent", which seems to be different than some popular conceptions. We have (in general) an infinite set of elements, each of which has a finite non-negative "magnitude" (norm), and the difference between any two distinct magnitudes is greater than some fixed finite quantity D. Notice that these elements need not be given in any particular linear order. For example, we might have elements (e.g., Gaussian integers) that can be indexed by two integers |
|
etc. |
etc... e[-2,-2] e[-1,-2] e[0,-2] e[1,-2] e[2,-2] ...etc. |
etc... e[-2,-1] e[-1,-1] e[0,-1] e[1,-1] e[2,-1] ...etc. |
etc... e[-2, 0] e[-1, 0] e[0, 0] e[1, 0] e[2, 0] ...etc. |
etc... e[-2, 1] e[-1, 1] e[0, 1] e[1, 1] e[2, 1] ...etc. |
etc... e[-2, 2] e[-1, 2] e[0, 2] e[1, 2] e[2, 2] ...etc. |
etc. |
|
Suppose we can show that if any one of these elements has the property P, then the property P must also be possessed by another element whose magnitude is strictly less than the magnitude of the former. Clearly this is impossible, because it implies an infinite descent in steps no smaller than D, beginning from a finite starting magnitude, whereas there are only finitely many distinct magnitudes less than any given magnitude. |
|
It might seem this argument can be re-cast in the form of a simple induction on the linear sequence of magnitudes, but it can't. To see why, suppose we arrange the elements of our set according to their magnitudes, and we get something like this |
|
0 1 sqrt(2) |
------- ------- -------- |
e[1, 0] e[-1,-1] |
e[0, 0] e[0, 1] e[ 1,-1] etc. |
e[-1,0] e[-1, 1] |
e[0,-1] e[ 1, 1] |
|
To apply induction here, we would first need to regard all the elements of a certain magnitude as "equivalent" with respect to the property P (i.e., they all possess P or none of them possess P). This is already a slight problem, since many properties depend on more than just the norm of an element. However, we can probably fix that up. |
|
The real problem becomes apparent if we try to apply the Proposition which says that if the elements of a certain magnitude m[k] possess a certain (negated) property -P, then this property is also possessed by at least one of the magnitudes m[0] to m[k−1]. Fair enough, but this doesn't prove that m[k] cannot possess -P. Of course, if we happen to know that m[0] does not possess ‑P, then we would have a contradiction, proving that m[k] must possess P. This is simply the contra-positive of the inductive argument that if m[0] possess P, and if m[j]'s possession of P implies m[j+1]'s possession of P for any given j, then we know m[k] possesses P. |
|
However, suppose we cannot determine by independent means whether m[0], or any other specific magnitude, possesses P. Does our induction argument (whether we phrase it in positive or contra-positive form) enable us to conclude anything now? Evidently not, because it relies on knowledge of the P-status of some specific element to make it effective. |
|
Now suppose we're given one additional piece of information: If m[k] possesses P, then there is a magnitude m[j] strictly less than m[k] such that m[j] possesses P. This information still does not enable us to use our induction argument (positive or contra-positive) to give a definite result, because we still haven't been given the P-status of any magnitude. However, we can now invoke an infinite descent to prove that no magnitude can possess P. We reach this conclusion simply from the fact that there are only a finite number of distinct magnitudes less than any given magnitude, whereas we have shown that if m[k] possesses P there must be infinitely many distinct magnitudes less than m[k]. This argument is actually much closer to the pigeon-hole principle than to induction. In fact, one could argue that it is the pigeon-hole principle. |
|
In summary, induction (whether applied upward or downward, and in the positive or contra-positive sense) always relies on the known P-status of at least one element, whereas the conclusion in an infinite descent is based purely on the structure of the implicative relation and the field of available elements. Thus, they clearly operate on a fundamentally different principle. One could make a strong case that infinite descent is equivalent to a specialized application of the pigeon-hole principle (which is another high-order principle), but not to induction. |
|
One more comment on the above schema for descent: |
|
DESCENT: -P(0) or -P(1) or ... or -P(n−1) <= -P(n) |
|
Notice that this is explicitly a finite descent implication, and can only be made effective by stipulating the P-status of one or more indices. The schema for a true infinite descent looks quite different, because as explained previously, it isn't an implication over a specific set of integers (such as those from 0 to n−1). Rather, it is an implication of the existence of a positive integer of smaller magnitude with the same property, ad infinitum. So it looks like this: |
|
INFINITE DESCENT P(n) → {m | P(m) AND |m| < |n|} |
|
which can also be expressed in the form |
|
P(n0) → P(n1) and P(n2) and P(n3) and ... ad infinitum |
|
where |n0| > |n1| > |n2| > ... ad infinitum |
|
By the way, it's worth being clear about the effect of taking the contra-positive of the usual statement of induction. Suppose we know that P(k) implies P(k+1), and let's consider two cases. First, suppose we can show that P(20) is true. In this case we can conclude that P(21), P(22),... etc. are all true, but we cannot say anything at all about P(19), P(18), etc. Now let's switch cases, and suppose we can show that P(20) is false. In that case we know P(19), P(18),..., P(0), P(−1), P(−2)... must all be false, but we know nothing about the status of P(21), P(22) and so on. In view of this, consider the basic definition of mathematical induction |
|
If P(k) → P(k+1), and if P(N), then P(k) for all k > N |
|
Notice that "k" is an arbitrary index, and the symbol "+1" merely signifies the "next" discrete index in the sequence. This arbitrary quality of the origin and discrete step sizes is an essential aspect of pure induction, which tends to get glossed over if we set k to the numerical magnitude 0 and regard "+1" as numerical addition. By doing that, we introduce non-inductive elements to the schema, which then can be exploited in various ways, such as by taking the contra-positive. |
|
However, if we are careful to define the pure induction schema with arbitrary indices of order as shown above, then notice that taking the contra-positive of this statement does not yield an essentially different statement, because the contra-positive of P(k) → P(k+1) is simply -P(k) → -P(k-1), and the implication of -P(N) is simply -P(k) for all k less than (i.e., prior to) N. So, if we denote -P by Q and if we map the indices j = 10−k, the contra-positive of the basic induction schema is |
|
If Q(j) → Q(j+1), and if Q(N), then Q(j) for all j > N |
|
which is formally identical to the previous expression, and still fundamentally different from an infinite descent. |
|