Ancient Square Roots

The ancient Babylonians had a nice method of computing square roots that can be applied using only simple arithmetic operations. To find a rational approximation for the square root of an integer N, let m be any number such that m² is less than N. Then m is slightly less than the square root of N, and so N/m is slightly greater than the square root of N. It follows that the average of these two numbers gives an even closer estimate

Iterating this formula leads to m values that converge very rapidly on the square root of N. This formula is sometimes attributed to Heron of Alexandria, because he described it in his "Metrica", but it was evidently known to the Babylonians much earlier.

This formula can be seen as a special case of a more general formula for the nth roots of N, which arises from the ancient Greek "ladder arithmetic", which we would call linear recurring sequences. The basic "ladder rule" for generating a sequence of integers to yield the square root of a number N is the recurrence formula

where m is the largest integer such that m² is less than N. The ratio s_n+1/s_n approaches √A + N as n goes to infinity. The sequence s_n is closely related to the continued fraction for √N. This method does not converge as rapidly as the Babylonian formula, but it's often more convenient for finding the "best" rational approximations for denominators of a given size.

As an example, to find r = √7 we have N=7 and m=2, so the recurrence formula is

If we choose the initial values s₀ = 0 and s₁ = 1 we have the following sequence

On this basis, the successive approximations and their squares are

Incidentally, applying this method to the square root of 3 gives the upper and lower bounds used by Archimedes in his calculation of π. (See Archimedes and √3.)

The reasoning by which the ancient Greeks arrived at this "ladder arithmetic" is not known, but from a modern perspective it follows directly from the elementary theory of difference equations and linear recurrences. If we let q denote the ratio of successive terms q = s_n+1/s_n, then the proposition is that q approaches √N + m. In other words, we are trying to find the number q such that (q – m)² = N. Expanding this and re-arranging terms gives

It follows that the sequence of numbers s₀=1, s₁=q, s₂=q², s₃=q³,.. satisfies the recurrence

as can be seen by dividing through by s_n–2 and noting that q equals s_n–1/s_n–2 and q² = s_n/s_n–2. Not surprisingly, then, if we apply this recurrence to arbitrary initial values s₀, s₁, the ratio of consecutive terms of the resulting sequence approaches q. (This is shown more rigorously below.)

It's interesting that if we assume an exact rational (non-integer) value for √A and exercise the algorithm in reverse, we generate an infinite sequence of positive integers whose magnitudes are strictly decreasing, which of course is impossible. From this the Greeks might have inferred that there cannot be an exact rational expression for any non-integer square root, and thereby discovered irrational numbers.

A similar method can be used to give the 3rd, 4th, or any higher root of any integer using only simple integer operations. In fact, this same method can be applied to find the largest real root of arbitrary polynomials. To illustrate, consider the general polynomial of degree d

The corresponding linear recurrence is

If we let r₁, r₂, ..., r_d denote the d distinct roots of the polynomial, then s_n is given by

where the coefficients A₁, A₂, …, A_d are constants which may be determined by d initial values of the sequence. If r₁ is the greatest of the roots (in terms of the norms) and is a positive real number then, as n increases, the first term of s_n becomes dominant, and so the ratio s_n+1/s_n approaches

Hence we can use the recurrence relation (2) to generate arbitrarily precise rational approximations of the largest root of (1). This is why the ratio of successive Fibonacci numbers 1, 1, 2, 3, 5, 8, 13, 21, ... goes to the dominant root of x² – x – 1, which is [1 + √5]/2.

To illustrate, consider the "Tribonacci sequence", whose recurrence formula is

and whose characteristic equation is the cubic

The roots of this polynomial are

If we arbitrarily select the initial values s₀=1, s₁=2, and s₂=4, we have the coefficients

All these numbers can be expressed as nested radicals given by the solution of the cubic. The nth term of the recurrence is

Notice that the magnitudes of r₂ and r₃ are less than 1, so as we take higher powers these terms become progressively smaller, and soon become negligible. On the other hand, r₁ is greater than 1, so as n increases the first term becomes larger, and we have s_n → A₁ r₁ⁿ as n goes to infinity, and so s_n/s_n–1 approaches the root r₁.

There are many fascinating things about linear recurrences. For example, they can be used to test numbers for primality. (See the article on Symmetric Pseudoprimes.) By the way, to avoid complex numbers, we can also express s_n for the Tribonacci sequence as

where

Naturally we can also use this method to generate rational approximations for the jth root of any integer M, simply by considering the polynomial x^j – M = 0. The only difficulty is that the magnitudes of all n roots of this polynomial (the complex jth roots of M) are all equal, so the sequence doesn't converge. This is clear from the recurrence s_n = M s_n–j, which just gives j disjoint geometric sequences. However, we can solve this problem by a simple change of variables. If we define y such that x = y – k where k^j is the greatest jth power less than M, we can make this substitution to give the polynomial (y–k)^j – M = 0. For example, with j=2 (for square roots) we have

The corresponding recurrence is the same as for the "ladder arithmetic" square root method given earlier. Notice that in order to give the roots of the characteristic polynomial very unequal magnitudes, so that the positive real square root would be dominant and cause rapid convergence, we make the substitution x = y – k, which just shifts the roots in the positive real direction, so that the negative square root was closer to zero and the positive square root was further from zero.

The same simple method can be applied to higher-order roots, noting that the nth roots of the number N are arranged in a circle on the complex plane, centered on the origin. By simply shifting this circle of roots in the positive real direction we can make the magnitude of the positive real root greater than the magnitude of any of the other roots (because the radius of the original circle of roots is smaller than the new positive real root). To illustrate, the cube root of 2 can be found using this method by means of the recurrence

With the initial values 0,0,1 we have the sequence

Each sequence of 3 terms is commonly divisible by 3, so we can factor these out to reduce the growth rate of the numbers. This gives

and so on. In this way we can find the rational approximations

The cube of 635/504 is 1.999998024...

This approach gives a simple linear recurrence, but when determining higher order roots it is clearly not very efficient, because the nth roots of the number N are uniformly arranged in a circle on the complex plane, centered on the origin. It's easy to see that we can't make the magnitude of the positive real root much greater than the magnitudes of the neighboring roots by just shifting in the positive real direction.

To improve the convergence, let's approach the problem systematically. The polynomial xⁿ – N = 0 has n complex roots x₁, x₂, ..., x_n, all of equal magnitude. Let x₁ denote the positive real root, and suppose X is a positive real estimate of x₁. We can then define a polynomial with the roots w₁, w₂, ...,w_n given by

Since X is a positive real number, it is closer to x₁ than to any of the other x roots, and so x₁ – X is the smallest denominator, and x₁ + X is the largest numerator. Consequently, w₁ is the w value with the largest magnitude. Solving the above expression for x_j gives

Substituting x_j into the original polynomial xⁿ – N = 0 gives an equation with the roots w_j

Expanding this and clearing the denominator gives the polynomial

Since w₁ is the largest root (probably by a large margin), and it's one of at most two real roots, and the other real root is the smallest, a rough estimate of w₁ is just the sum of the w roots, which equals the coefficient of w^n–1 of the monic form of the polynomial (i.e., the above polynomial divided by the first coefficient). Letting W denote this coefficient, we have

An improved estimate for x₁ is therefore X(W+1)/(W–1), which gives

Although this looks quite different, it is actually a generalization of the ancient Babylonian (and Heron's) method for square roots. To see this, note that a slightly less efficient application of this method would be to replace w with 2(w+1), which leads to the substitution

If we substitute this expression for x into the original equation xⁿ – N = 0 and expand, we get a polynomial in w such that the real positive x root maps to the largest w root, and the remaining w roots are relatively small (although they are not as small as with the previous substitution, which is why this formula isn't quite as efficient). For square roots we have n=2 and we take the coefficient of w of the monic form of this polynomial as a good estimate of the largest root w, leading to the result

which is the ancient Babylonian/Heron formula. In contrast, using the more nearly optimum substitution with the same approach leads to the formula for square roots

which converges slightly faster than the ancient formula.

It's interesting to notice the appearance of the quantity (w+1)/(w–1) in this context, considering the relation between root extractions and logarithms (if xⁿ = N then ln(x) = ln(N)/n), because we also have ln[(w+1)/(w–1)] = 2[1/w + 1/(3w³) + 1/(5w⁵) + ...]. It is true that the substitution (w+1)/(w–1) is optimal for extracting the nth root?

Return to MathPages Main Menu