Infinite Descent versus Induction

It is often said that the principle of "infinite descent", developed
by Fermat in his study of Diophantine analysis, is equivalent to
induction.  In assessing this claim we first need to understand what
kind of "equivalence" is being claimed.  Typically those who claim
such an equivalence will substantiate it with a demonsration whereby
they re-write a proof by descent in the form of a proof by induction,
by means of some set of logical connectives and the other primitive
axioms of Peano arithmetic.  However, it's obvious that since Peano's 
axioms for the natural numbers consist of nothing other than 

   (i) the naturals have a first member
  (ii) each member has a unique successor
 (iii) there are no loops in the chain of successors
  (iv) mathematical induction

the entirety of Peano arithmetic, including every theorem of Peano 
arithmetic, can be derived from these axioms.  Would it be suitable 
to replace (iv) with "infinite descent"?  If the two are completely 
equivalent, then the answer would have to be yes, but I would argue 
that the real answer is no.  It is true that the axiom of induction 
is implicit in the principle of infinite descent, but it is just as 
true that the other three axioms are implicit in the principle of 
infinite descent.  This means that infinite descent is of a higher 
order than these basic axioms, and is constructed by invoking a 
combination of them.  From this point of view, it makes no more 
(and no less) sense to say that infinite descent is equivalent to 
induction than to say that infinite descent is equivalent to the 
unique successor axiom, or to the first member axiom.  All of these
axioms (plus the tacit axioms of logic) are implicated in the
principle of infinite descent, but no one of them individually 
can rightly be said to be "equivalent" to infinite descent.

Those who argue that infinite descent is equivalent to induction
typically refer to a proof such as the following for the fact that
the sum of the first n integers is n(n+1)/2:

  If not, there is a least integer, say N, for which this is 
  false.  N =/= 0, since 0 = 0(0+1)/2.  Since N - 1 < N, we 
  must have the sum of the first N - 1 integers is (N-1)N/2 and 
  adding N to both sides and doing obvious arithmetic, we reach the 
  contradiction that the sum of the first N numbers is N(N+1)/2.
 
However, this example is actually a proof by induction, not a 
proof by infinite descent.  The proof supposes the existence of 
a least integer N such that the sum from 1 to N is NOT equal to 
N(N+1)/2.  So far so good.  A proof by infinite descent would 
then show that there is an absolutely SMALLER number with the same 
property.  This is the key distinction: the number must have a 
smaller MAGNITUDE than N, not just PRECEED N in the ordered 
sequence of numbers.

Pure induction is a chain of implications on the truth values 
of a given ordered sequence of propositions, and the induction 
is "activated" by citing the truth value(s) of one (or more) 
particular propositions (such as the case N=0 in the above
example), from which the others follow.  Notice that the actual
magnitudes of the indices of the sequence are irrelevant; all
that matters is their discrete order.  (For a related discussion,
see Representing Sets of Pure Order).  

In contrast, a proof by infinite descent operates essentially on 
the magnitudes of the indices of an ordered sequence of propositions, 
and shows that the indices cannot all differ by more than any fixed
finite magnitude "epsilon".  This is a subtle but important concept, 
analagous to the different kinds of justifications that can be given 
for the fact that the slope of  y = x^2  is  2x.  We know that for 
some small non-zero increment dx the slope near any given x is 
approximately equal to [(x+dx)^2 - x^2]/dx, which is 2x + dx.  Now, 
on one level it seems that we can just set dx equal to 0, and we're 
done.  However, the key step in our derivation involved a division
by dx, so we really need to invoke a rigorous "infinite descent"
argument leading to the concept of a LIMIT for a sequence of
values of dx approaching nearer to zero than any fixed finite
magnitude.  Thus, there is some validity in the claim that Fermat's
infinite descent concept is one of the proto-types for reasoning
about infintesimals and limiting sequences, which is perhaps not
surprising in view of Fermat's own contributions to the foundation
of the differential calculus.

To illustrate more clearly the distinction between pure induction
and infinite descent, let's define the cumulative sum S(k) by 
the recurrence S(k) = S(k-1) + k with S(0) = 0, and suppose that 
S(N) = N(N+1)/2 + D for some non-zero integer D.  The question 
is, can we necessarily construct a number with the same property 
but a smaller absolute value?  There may be a way, but we can't 
simply use the number N-1, because this doesn't ensure a smaller
magnitude, i.e., we can continue to subtract 1 indefinitely (into 
negative values), and the numbers at some point become larger in
magnitude, even though they are lesser in value.  Thus, we may be 
able to use a chain of implication from N to N-1 as part of an 
induction argument, but not an infinite descent (at least not 
without some other constraint on the indices).

Notice that if N has the above property then simple subtraction 
gives
         S(N-1)   =   S(N) - N   =   (N-1)N/2 + D

so N-1 has the same property.  Therefore, if we can find ANY integer 
k such that S(k)=k(k+1)/2 we can assert that D=0 for all integers
(positive and negative.)  However, this is NOT a proof by infinite 
descent, it is a proof by induction from the particular value of k 
for which we know that S(k) is of the form k(k+1)/2.

In contrast, consider a proof by infinite descent that sqrt(2) is 
irrational.  We suppose there exist a pair of positive integers 
n0,n1 such that  1 + (n1/n0) = sqrt(2).  Now consider the integer 
n2 = n0 - 2n1.  It's easy to verify that 1 + (n2/n1) = sqrt(2),
from which it follows that n2 = [sqrt(2)-1] n1.  Now, we have
the inequality 0 < [sqrt(2)-1] < 1 , from which it follows that
0 < n2 < n1.  Repeating this procedure, we have n3 = n2 - 2n1,
and so on.  Thus, we have an infinite sequence of positive integers 
n0, n1, n2,... of strictly decreasing magnitude, which is impossible
because there are only a finite number of positive integers less 
than any given integer.  This is an example of a genuine proof by 
infinite descent.

Much of the confusion in this area arises from a failure to clearly
distinguish between descent and infinite descent.  We often see
classifications of various "induction hypotheses" allowing us to
assert that the proposition P(n) is true for all natural numbers n
if any of these conditions are satisfied

     WEAK      P(0) and ( P(n-1) => P(n) )
    STRONG     P(0) and P(1) and ... and P(n-1)  =>   P(n)
    DESCENT   -P(0) or -P(1) or  ... or -P(n-1)  <=  -P(n)

Whether it is appropriate to say that these conditons are "equivalent"
depends on precisely what kind of equivalence we mean, because they
also involve logical connectives and other common notions, but in any
case this definition of "DESCENT" is certainly close to pure induction,
since it is simply a contrapositive formulation of strong induction 
(i.e.  Q => P  is equivalent to  -Q <= -P,  its contrapositive).  

However, the above definition of "DESCENT" is quite distinct from the
principle of infinite descent.  Here we see a reason for much of the 
confusion in this area, because people often fail to recognize this
distinction (especially since the term "descent" is often used as
shorthand for "infinite descent").  

It's important to keep in mind that the essential distinction between 
"induction" and "infinite descent" is not whether the argument 
proceeds in an "upward" or "downward" direction, nor whether it's 
applied in a positive or negative logical sense.  The essential 
distinction is that induction relies on a principle of discrete 
ORDER and completeness, whereas "infinite descent" represents a 
higher-order principle of discrete absolute MAGNITUDE combined 
with something very much like the pigeon-hole principle.

The above schema for "DESCENT" clearly doesn't embody the principle 
of "infinite descent", because it simply asserts that if the integer 
n does not have a specific property P, then at least one of the 
integers 0 to n-1 must also not have the property P.  From this 
it follows that 0 must not have the property.  Thus, if we happen to 
know that 0 DOES have the property, we have a contradiction.  This 
is indeed equivalent to induction, but it isn't an "infinite descent" 
argument.

A true argument by "infinite descent" doesn't assert something about 
the set of numbers 0 to n-1, it asserts THE EXISTENCE of an integer 
that is both non-zero and smaller in *magnitude* than n.  Specifically, 
it asserts that if an integer n has the property P, then THERE EXISTS 
a non-zero integer of magnitude less than |n| with the same property.
This implies an INFINITE sequence of strictly decreasing integer 
magnitudes beginning with |n|, which is impossible.  Notice that 
the argument does not depend on knowledge of the P-status of ANY 
particular number.  Also, notice that we can apply this form of 
argument in any context where every element has a non-negative 
"norm" (i.e., magnitude), and the difference between any two 
distinct magnitudes is greater than some fixed constant.

To express the essence of induction and infinite descent a little 
more precisely, let's first consider the classical notion of 
mathematical induction in a context slightly beyond the naturals, 
namely, the integers.  An inductive argument relies on the fact 
that we can arrange the elements of some infinite set in a definite 
ordered linear sequence, one that can be indexed with the integers.  
Thus we have (in general) a "doubly infinite" ordered sequence of 
elements
         ..., e[-2], e[-1], e[0], e[1], e[2], ...

and we can assert that this sequence contains ALL the elements e[] 
of the set in question.  Now, induction says that if the possession 
of a specific property P by the element e[j] for any arbitrary 
integer j implies that e[j+1] also has the property P, and if we 
can establish for some specific integer k that e[k] possesses the 
property P, then EVERY element e[j] for j greater than or equal to 
k possesses the property P.

Needless to say, we can replaces e[j+1] with e[j-1], and replace 
"greater than" with "less than" in the above proposition, and the 
resulting proposition also holds.  Thus, we can "ascend" or "descend" 
with this argument.  Furthermore, we can also logically negate 
either of these propositions (using DeMorgan's rule) to arrive at 
equivalent true statements.  However, in all cases, the statements 
produced in this way lead us to a definite conclusion ONLY with the 
addition of knowledge of the P-status of at least one particular 
element.  Thus, the essential technique of mathematical induction 
consists of establishing the P-status of one or more element(s) 
e[k] by prior means, and then invoking some general relation 
between the P-status of one element and the P-status of some others 
in the ordered sequence of elements.  Notice that we haven't used 
any notion of the relative "magnitudes" of the elements e[] (they 
may not even HAVE magnitudes).  We have simply used the ordering 
and completeness of the sequence.

Now let me describe my conception of "infinite descent", which seems 
to be different than some popular conceptions.  We have (in general) 
an infinite set of elements, each of which has a finite non-negative 
"magnitude" (norm), and the difference between any two distinct 
magnitudes is greater than some fixed finite quantity D.  Notice that
these elements need not be given in any particular linear order.  For
example, we might have elements (e.g., Gaussian integers) that can be
indexed by two integers 

              etc.
 etc...  e[-2,-2]  e[-1,-2]  e[0,-2]  e[1,-2]  e[2,-2] ...etc
 etc...  e[-2,-1]  e[-1,-1]  e[0,-1]  e[1,-1]  e[2,-1] ...etc
 etc...  e[-2, 0]  e[-1, 0]  e[0, 0]  e[1, 0]  e[2, 0] ...etc
 etc...  e[-2, 1]  e[-1, 1]  e[0, 1]  e[1, 1]  e[2, 1] ...etc
 etc...  e[-2, 2]  e[-1, 2]  e[0, 2]  e[1, 2]  e[2, 2] ...etc
              etc.

Suppose we can show that if any one of these elements has the property 
P, then the property P must also be possessed by another element whose 
magnitude is strictly less than the magnitude of the former.  Clearly 
this is impossible, because it implies an infinite descent in steps 
no smaller than D, beginning from a finite starting magnitude, whereas 
there are only finitely many distinct magnitudes less than any given 
magnitude.

It might seem this argument can be re-cast in the form of a simple 
induction on the linear sequence of magnitudes, but it can't.  To see 
why, suppose we arrange the elements of our set according to their 
magnitudes, and we get something like this

        0         1       sqrt(2)
     -------   -------   --------                            
               e[1, 0]   e[-1,-1]
     e[0, 0]   e[0, 1]   e[ 1,-1]      etc.
               e[-1,0]   e[-1, 1]
               e[0,-1]   e[ 1, 1]

To apply induction here, we would first need to regard all the 
elements of a certain magnitude as "equivalent" with respect to 
the property P (i.e., they all possess P or none of them possess P).
This is already a slight problem, since many properties depend on 
more than just the norm of an element.  However, we can probably 
fix that up.  

The real problem becomes apparent if we try to apply the Proposition
which says that if the elements of a certain magnitude m[k] possess a
certain (negated) property notP, then this property notP is also 
possessed by AT LEAST one of the magnitudes m[0] to m[k-1].  Fair 
enough, but this doesn't prove that m[k] cannot possess notP.  Of 
course, if we happen to know that m[0] does not possess notP, then 
we would have a contradiction, proving that m[k] must possess P.
This is simply the contra-positive of the inductive argument that 
if m[0] possess P, and if m[j]'s possession of P implies m[j+1]'s 
possession of P for any given j, then we know m[k] possesses P.

However, suppose we cannot determine by independent means whether 
m[0], or any other specific magnitude, possesses P.  Does our 
induction argument (whether we phrase it in positive or contra-
positive form) enable us to conclude anything now?  Evidently not, 
because it relies on knowledge of the P-status of some specific 
element to make it effective.

Now suppose we're given one additional piece of information:  If m[k] 
possesses P, then there is a magnitude m[j] strictly less than m[k] 
such that m[j] possesses P.  This information still does not enable 
us to use our induction argument (positive or contra-positive) to give 
a definite result, because we still haven't been given the P-status 
of any magnitude.  However, we CAN now invoke an infinite descent to 
prove that NO magnitude can possess P.  We reach this conclusion 
simply from the fact that there are only a finite number of distinct 
magnitudes less than any given magnitude, whereas we have shown that 
if m[k] possesses P there must be infinitely many distinct magnitudes 
less than m[k].  This argument is actually much closer to the pigeon-
hole principle than to induction.  In fact, one could argue that it 
IS the pigen-hole principle.

In summary, induction (whether applied upward or downward, and in the 
positive or contra-positive sense) always relies on the known P-status 
of at least one element, whereas the conclusion in an infinite descent 
is based purely on the structure of the implicative relation and the 
field of available elements.  Thus, they clearly operate on a 
fundamentally different principle.  One could make a strong case 
that infinite descent is equivalent to a specialized application of 
the pigeon-hole principle (which is another high-order principle), 
but not to induction.

One more comment on the above schema for DESCENT:

      -P(0) or -P(1) or  ... or -P(n-1)  <=  -P(n)

Notice that this is explicitly a FINITE descent implication, and 
can only be made effective by stipulating the P-status of one or 
more indicies.  The schema for a true infinite descent looks quite 
different, because as explained previously, it isn't an implication 
over a specific set of integers (such as those from 0 to n-1).  
Rather, it is an implication of the EXISTENCE of a positive integer 
of smaller magnitude with the same property, ad infinitum.  So it 
looks like this:

INFINITE DESCENT       P(n) =>  {m | P(m) AND |m| < |n|}

which can also be expressed in the form

  P(n0)  =>  P(n1) and P(n2) and P(n3) and ... ad infinitum

   where  |n0| > |n1| > |n2| > ...  ad infinitum

By the way, it's worth being clear about the effect of taking the 
contra-positive of the usual statement of induction.  Suppose we 
know that P(k) implies P(k+1), and let's consider two cases.  First, 
suppose we can show that P(20) is TRUE.  In this case we can conclude 
that P(21), P(22),... etc are all TRUE, but we cannot say anything at 
all about P(19), P(18), etc.  Now let's switch cases, and suppose we 
can show that P(20) is FALSE.  In that case we know P(19), P(18),...,
P(0), P(-1), P(-2)... must all be FALSE, but we know nothing about 
the status of P(21), P(22) and so on.  In view of this, consider the 
basic definition of mathematical induction

  If P(k) => P(k+1), and if P(N), then P(k) for all k > N

Notice that "k" is an arbitrary index, and the symbol "+1" merely
signifies the "next" discrete index in the sequence.  This arbitrary
quality of the origin and discrete step sizes is an essential aspect 
of pure induction, which tends to get glossed over if we set k to the 
numerical magnitude 0 and regard "+1" as numerical addition.  By doing
that, we introduce non-inductive elements to the schema, which then
can be exploited in various ways, such as by taking the contra-
positive.

However, if we are careful to define the pure induction schema with
arbitrary indicies  of order as shown above, then notice that taking 
the contra-positive of this statement does not yield an essentially 
different statement, because the contra-positive of  P(k)=>P(k+1)  is 
simply  notP(k)=>notP(k-1), and the implication of notP(N) is simply  
notP(k)  for all k less than (viz, prior to) N.  So, if we denote 
notP by Q and if we map the indicies j=10-k, the contra-positive of 
the basic induction schema is

  If Q(j) => Q(j+1), and if Q(N), then Q(j) for all j > N

which is formally identical to the previous expression, and still
fundamentally different from an infinite descent.

Return to MathPages Main Menu