Ancient Square Roots

The ancient Babylonians had a nice method of computing square roots
that can be applied using only simple arithmetic operations.  To find
a rational approximation for the square root of an integer N, let k 
be any number such that k^2 is less than N.  Then k is slightly less 
than the square root of N, and so N/k is slightly greater than the 
square root of N.  It follows that the average of these two numbers 
gives an even closer estimate

                          k + N/k
                k_new  =  -------

Iterating this formula leads to k values that converge very rapidly
on the square root of N.  This formula is sometimes attributed to
Heron of Alexandria, because he described it in his "Metrica", but
it was evidently known to the Babylonians much earlier.  

It's interesting to note that this formula can be seen as a special 
case of a more general formula for the nth roots of N, which arises
from the ancient Greek "ladder arithmetic", which we would call
linear recurring sequences.  The basic "ladder rule" for generating 
a sequence of integers to yield the square root of a number N is the 
recurrence formula

            s[j] = 2k s[j-1] + (N-k^2) s[j-2]

where k is the largest integer such that k^2 is less than N.  The 
ratio s[j+1]/s[j] approaches sqrt(A) + N as j goes to infinity.  The 
sequence s[j] is closely related to the continued fraction for
sqrt(N).  This method does not converge as rapidly as the Babylonian
formula, but it's often more convenient for finding the "best" 
rational approximations for denominators of a given size.

As an example, to find r = sqrt(7) we have N=7 and k=2, so the 
recurrence formula is simply s[j] = 4s[j-1] + 3s[j-2].  If we choose 
the initial values s[0]=0 and s[1]=1 we have the following sequence

 0, 1, 4, 19, 88, 409, 1900, 8827, 41008, 190513, 885076, ...

On this basis, the successive approximations and their squares are

               r                       r^2
      ---------------------     -----------------
       (4/1) - 2                   4.0
       (19/4) - 2                  7.5625
       (88/19) - 2                 6.9252...
       (409/88) - 2                7.0104...
       (1900/409) - 2              6.99854...
       (8827/1900) - 2             7.000201...
       (41008/8827) - 2            6.9999719...
       (190513/41008) - 2          7.00000390...
       (885076/190513) - 2         6.999999457...

Incidentally, applying this method to the square root of 3 gives the
upper and lower bounds used by Archimedes in his calculation of pi.
(See Archimedes and Sqrt(3).)  

The reasoning by which the ancient Greeks arrived at this "ladder
arithmetic" is not known, but from a modern perspective it follows
directly from the elementary theory of difference equations and
linear recurrences.  If we let q denote the ratio of successive 
terms q = s[j+1]/s[j], then the proposition is that q approaches
sqrt(N) + k.  In other words, we are trying to find the number q
such that 
                    (q - k)^2  =  N

Expanding this and re-arranging terms gives

                q^2 - 2k q - (N-k^2)  =  0

It follows that the sequence of numbers s[0]=1, s[1]=q, s[2]=q^2,..,
s[k]=q^k,... satisfies the recurrence

             s[n] = 2k s[n-1] + (N-k^2) s[n-2]

as can be seen by dividing through by s[n-2] and noting that q equals
s[n-1]/s[n-2] and q^2 = s[n]/s[n-2].  Not surprisingly, then, if we 
apply this recurrence to arbitrary initial values s[0], s[1], the 
ratio of consecutive terms of the resulting sequence approaches q.
(This is shown more rigorously below.)

It's interesting that if we assume an EXACT rational (non-integer) 
value for sqrt(A) and exercise the algorithm in reverse, we generate 
an infinite sequence of positive integers whose magnitudes are 
strictly decreasing, which of course is impossible.  From this
the Greeks might have inferred that there can not be an exact 
rational expression for any non-integer square root, and thereby 
discovered irrational numbers.

A similar method can be used to give the 3rd, 4th, or any higher root 
of any integer using only simple integer operations.  In fact, this 
same method can be applied to find the largest real root of arbitrary
polynomials.  To illustrate, consider the general polynomial

         x^k - c1 x^(k-1) - c2 x^(k-2) ... - ck  = 0         (1)

The corresponding linear recurrence is

       s[n] = c1 s[n-1] + c2 s[n-2] + ... + ck s[n-k]        (2)

If we let r1,r2,...,rk denote the k distinct roots of this polynomial,
then s[n] is given by

        s[n] = A1 r1^n + A2 r2^n + ... + Ak rk^n

where the coefficients A1,A2,..,Ak are constants which may be 
determined by n initial values of the sequence.  If r1 is the 
greatest of the roots (in terms of the norms) and is a positive 
real number then, as n increases, the first term of s[n] becomes 
dominant, and so the ratio s[n+1]/s[n] approaches 

          [A r1^(n+1)] / [A r1^n]  =  r1

Hence we can use the recurrence relation (2) to generate arbitrarily 
precise rational approximations of the largest root of (1).  This is 
why the ratio of successive Fibonacci numbers 1,1,2,3,5,8,13,21,...
goes to the dominant root of x^2 - x - 1, which is [1+sqr(5)]/2.

To illustrate, consider the "Tribonacci sequence", whose recurrence 
formula is s[n]=s[n-1]+s[n-2]+s[n-3], and whose characteristic 
equation is the cubic

                    x^3 - x^2 - x - 1 = 0

The roots of this polynomial are

        r1 =  1.839286755214161..
        r2 = -0.419643377607081.. + i 0.606290729207199..
        r3 = -0.419643377607081.. - i 0.606290729207199..

If we arbitrarily select the initial values s[0]=1, s[1]=2, and s[2]=4,
we have the coefficients

        A1 =  1.137451572282629..
        A2 = -0.068725786121314.. + i 0.12352180763419..
        A3 = -0.068725786121314.. + i 0.12352180763419..

All these numbers can be expressed as nested radicals given by the 
solution of the cubic, but I'm too lazy to type them.  Anyway, the 
nth term of the recurrence is

         s[n]  =  A1 r1^n + A2 r2^n + A3 r3^n

Notice that the magnitudes of r2 and r3 are less then 1, so as we 
take higher powers these terms become progressively smaller, and 
soon become negligible.  On the other hand, r1 is greater than 1, 
so as n increases the first term becomes larger, and we have 
s[n] -> A1 r1^n  as n goes to infinity, and so s[n]/s[n-1] approaches
the root r1.

There are many fascinating things about linear recurrences like this.
For example, they can be used to test numbers for primality.  (See the 
article on "Symmetric Pseudoprimes".)  By the way, to avoid complex 
numbers, we can also express s[n] for the Tribonacci sequence as

    s[n] = A1 r1^n  +  2B^n [u cos(nq) - v sin(nq)]

          B =  0.737352705760327...
          q =  2.176233545491871...
          u = -0.068725786121314.. 
          v =  0.12352180763419..

Naturally we can also use this method to generate rational 
approximations for the jth root of any integer M, simply by 
considering the polynomial x^j - M = 0.  The only difficulty is that 
the magnitudes of all n roots of this polynomial (the complex jth 
roots of M) are all equal, so the sequence doesn't converge.  This 
is clear from the recurrence s[k] = M s[k-j], which just gives k 
disjoint geometric sequences.  However, we can solve this problemby 
a simple change of variables.  If we define y such that x = y - k 
where k^j is the greatest nth power less than M, we can make this 
substitution to give the polynomial (y-k)^j - M = 0.  For example, 
with j=2 (for square roots) we have

               y^2 - 2k y - (M - k^2)  =  0

The corresponding recurrence is 

             s[n]  =  2k s[n-1]  +  (M-k^2) s[n-2]

which is the same as the "ladder arithmetic" square root method given 
earlier.  Notice that in order to give the roots of the characteristic
polynomial very unequal magnitudes, so that the positive real square 
root would be dominant and cause rapid convergence, we make the 
substitution x = y - k, which just shifts the roots in the positive 
real direction, so that the negative square root was closer to zero 
and the positive square root was further from zero.  

The same simple method can be applied to higher-order roots, noting
that the nth roots of the number N are arranged in a circle on the 
complex plane, centered on the origin.  By simply shifting this circle
of roots in the positive real direction we can make the magnitude of 
the positive real root greater than the magnitude of any of the other 
roots (because the radius of the original circle of roots is smaller
than the new positive real root).  To illustrate, the cube root of 2 
can be found using this method by means of the recurrence 
           s[j] =  3s[n-1] - 3s[n-2] + 3s[n-3]

With the initial values 0,0,1 we have the sequence

  0, 0, 1, 3, 6, 12, 27, 63, 144, 324, 729, 1647, 3726,...

Each sequence of 3 terms is commonly divisible by 3, so we can
factor these out to keep the terms from growing so fast.  This gives

         1,2, 4,9,21,48,...
                3, 7,16,36,81,183,...
                        12,27, 61,138,312,705,...

and so on.  In this way we can find the rational approximations

               s[6]/s[5] - 1  =  5/4
              s[13]/s[12] - 1  =  29/23
              s[17]/s[16] - 1  =  63/50
              s[25]/s[24] - 1  =  635/504

The cube of 635/504 is 1.999998024...

This approach gives a simple linear recurrence, but when determining 
higher order roots it is clearly not very efficient, because the nth 
roots of the number N are uniformly arranged in a circle on the 
complex plane, centered on the origin.  It's easy to see that we 
can't make the magnitude of the positive real root much greater 
than the magnitudes of the neighboring roots by just shifting in 
the positive real direction.

To improve the convergence, let's approach the problem systematically.
The polynomial  x^n - N = 0  has n complex roots x1,x2,..,xn, all of 
equal magnitude.  Let x1 denote the positive real root, and suppose 
X is a positive real estimate of x1.  We can then define a polynomial 
with the roots w1,w2,..,wn given by

                          xj + X
                   wi  =  ------
                          xj - X

Since X is a positive real number, it is closer to x1 than to any of 
the other x roots, and so x1 - X is the smallest denominator, and 
x1 + X is the largest numerator.  Consequently, w1 is the w value 
with the largest magnitude.  Solving the above expression for xj 
                             wj + 1
                   xj  =  X  ------
                             wj - 1

Substituting xj into the original polynomial x^n - N = 0 gives
an equation with the roots wj

                   n  / w + 1 \ n
                  X  ( ------- )   -  N  =  0
                      \ w - 1 /

Expanding this and clearing the denominator gives the polynomial

       n        n         n        n-1
    ( X  - N ) w   +  n( X  + N ) w    +  ...  =  0

Since w1 is the largest root (probably by a large margin), and it's
one of at most two real roots, and the other real root is the smallest,
a rough estimate of w1 is just the sum of the w roots, which equals
the coefficient of w^(n-1) of the monic form of the polynomial (i.e.,
the above polynomial divided by the first coefficient).  Letting W
denote this coefficient, we have

                        X  + N
                W  = n --------
                        X  - N

An improved estimate for x1 is therefore X(W+1)/(W-1), which gives

                        (n+1)X  + (n-1)N
           X_new  =  X  ----------------
                        (n-1)X  + (n+1)N

Although this looks quite different, it is actually a generalization
of the ancient Babylonian (and Heron's) method for square roots.  To
see this, note that a slightly less efficient application of this 
method would be to replace w with 2(w+1), which leads to the 
                              2w + 3
                    x   =  X  ------
                              2w + 1

If we substitute this expression for x into the original equation
x^n - N = 0  and expand, we get a polynomial in w such that the real
positive x root maps to the largest w root, and the remaning w roots
are relatively small (although they are not as small as with the
previous substitution, which is why this formula isn't quite as
efficient).  For square roots we have n=2 and we take the coefficient
of w of the monic form of this polynomial as a good estimate of the 
largest root w, leading to the result

                          X  +  N        X + N/X
           X_new   =   X  --------   =   -------
                            2               2
                          2X  + 0

which is the ancient Babylonian/Heron formula.  In contrast, using 
the more nearly optimum substitution with the same approach leads to
the formula for square roots

                        X  +  3N
           X_new  =  X  ---------
                         3X  + N

which converges slightly faster than the ancient formula.

It's interesting to notice the appearance of the quantity (w+1)/(w-1)
in this context, considering the relation between root extractions
and logarithms (if x^n = N then ln(x) = ln(N)/n), because we also
have ln[(w+1)/(w-1)] = 2[1/w + 1/(3w^3) + 1/(5w^5) + ...].  It is
true that the substitution (w+1)/(w-1) is optimal for extracting
the nth root?

Return to MathPages Main Menu