The Step and the Footnote

When discussing the discovery of special relativity, Einstein spoke (“in a curiously impersonal way” according to Pais) of “the Step” (den Schritt), referring to the realization that the Lorentz transformation transcended its connection with Maxwell’s equations, and actually represents the relationship between inertial coordinate systems, i.e., the most natural measures of space and time in any given inertial frame. The key to this understanding was the relativity of simultaneity. Einstein’s first paper on relativity (“On the Electrodynamics of Moving Bodies”) is a remarkably mature and comprehensive presentation of this new point of view, but not surprisingly (given the novelty) it contains some awkward passages and points that can be confusing, especially when read in the modern context rather than the context in which it was written.

The paper was written just after Einstein finished his famous paper on the photo-electric effect, in which he explains why, despite the many empirical successes of Maxwell’s equations and the classical wave theory of electromagnetic radiation, there are reasons to believe that “the theory of light, operating with continuous spatial functions, leads to contradictions when applied to the phenomena of emission and transformation of light”. In these phenomena it appears that light behaves in some ways like localized ballistic corpuscles rather than expanding waves. The theory of black body radiation, the photo-electric effect, and other related phenomena, led him to the belief that Maxwell’s equations were inadequate to account for these phenomena, and therefore could not form part of the foundations of physics.

All my attempts, however, to adapt the theoretical foundations of physics to these phenomena failed completely. It was as if the ground had been pulled out from under one, with no firm foundation to be seen anywhere upon which one could have built… Reflections of this type made it clear to me as long ago as shortly after 1900, i.e., shortly after Planck's trailblazing work, that neither mechanics nor electrodynamics could (except in limiting cases) claim exact validity. By and by I despaired of the possibility of discovering the true laws... I came to the conviction that only the discovery of a universal formal principle could lead us to assured results...

This brings us to Einstein’s paper on the electrodynamics of moving bodies, in which two fundamental formal principles are proposed. He actually gives two statements of these principles, first in the introduction, and then again more formally at the beginning of Section 2. In the introductory section, after noting the failure of all attempts to detect the motion of the earth relative to the “light medium”, he says

In all coordinate systems in which the mechanical equations are valid, also the same electrodynamic and optical laws are valid…, [and] in empty space light is always propagated with a definite velocity c which is independent of the state of motion of the emitting body.

When he refers here to “the mechanical equations” and “the electrodynamic laws” he is not referring to Newton’s or Maxwell’s equations, because he knows that neither of these are correct except in some suitable limits. He is referring to the true laws of mechanics and electrodynamics, whatever they may be, since this is necessary to account for the failure of the attempts to detect the earth’s motion. All such attempts amount to efforts to detect the difference between mechanical and electrodynamical transformations to different frames of reference. The failure of these attempts implies that these phenomena actually transform in exactly the same way, even though Newton’s and Maxwell’s equations do not transform in the same way.

We also note that Einstein has quietly introduced the concept of “empty space”, along with the assertion that light propagates in it. This is an important conceptual step, since in theories positing a luminiferous ether there is no empty space (or, if there is, light cannot propagate in it). The assumption that light propagates in “empty space” represents a significant conceptual shift, implying that electromagnetic radiation behaves in some ways like ballistic entities moving through an empty plenum, rather than as a wave in a medium. (The word “empty” essentially stands for acausal, which Einstein will later use in his deductions.) Needless to say, this is consonant with Einstein’s conception of light from his recently completed paper on light quanta. Indeed, the two papers can almost be seen as a single paper, establishing a theoretical framework compatible with the dual properties of light.

Following the overview in the introductory comments, the paper begins with Part 1, called the “Kinematic Part”. This title has led to some confusion over the years, because it gives the impression that this part of the paper deals only with kinematics, whereas in fact the very first sentence in this section is

Consider a coordinate system in which the Newtonian mechanical equations are valid.

Strictly speaking, the word “kinematics” refers to purely tautological relations between specified coordinate systems, without regard to inertia or any other physical law, whereas Einstein’s opening sentence refers explicitly to the laws of inertia, so we are clearly outside the realm of kinematics from the very start of this section. This led critics like Larmor to complain that Einstein’s theory consisted of “dynamics masquerading as kinematics”. The title is (somewhat) justified only because, once the operational definition of inertial coordinate systems has been given (in Section 1) and the relationship between them has been established (completed in Section 3), many interesting consequences follow from pure kinematics. However, we must not lose sight of the fact that Lorentz invariance (which underlies these kinematics) is not a kinematical fact, nor is it tautological. It is a physical dynamical fact that can be tested. (One might also say that special relativity blurs the line between kinematics and dynamics, if we regard the metric of spacetime as efficacious.)

Another difficulty immediately presents itself: Einstein asks us to consider a system of coordinates in which Newton’s equations are valid, and yet by the end of the paper he has established that Newton’s equations are not valid in any system of coordinates. (For example, he gives the non-Newtonian expression for kinetic energy.) Does the paper thereby invalidate its own starting point? The opening sentence obviously needs to be qualified, but, as we will see, this is easily done, and the reasoning of the paper is unaffected. In fact, with the necessary qualification, the paper could have been significantly simplified and the foundations of the theory clarified.

First, we need to recognize a shift in the meaning of “coordinate system” that takes place within the paper, as part of the overall conceptual “step” that the paper is laboring to convey. At the end of the introductory comments just prior to Section 1, Einstein says “the assertions of any such theory have to do with the relationships between rigid bodies (systems of coordinates), clocks, and electromagnetic processes". Thus the term "system of coordinates" refers (at this stage) to space coordinates, which behave like a grid of solid unaccelerated measuring rods. Time was regarded as a separate independent and universal parameter, not a coordinate.

The ambiguity (especially for a modern reader) is reflected in the two different definitions of inertial coordinate systems. The old naïve definition invoked just the first law (a deficient definition that has been carried over uncritically into a surprising number of modern texts), whereas the new full definition invokes all of Newton’s laws. Under the old definition, a system of coordinates need only be unaccelerated, which allows the time coordinate to be skewed arbitrarily, so it is not a complete definition. Once special relativity is recognized, we see that inertial coordinate systems require not only the first law, but the isotropy of inertia entailed by Newton’s third law (with the understanding that there is no action at a distance, and all momentum-carrying entities, including fields, must be taken into account). The second law then follows redundantly.

The fact that Einstein is using the term “coordinate system” to refer just to space coordinates is confirmed by the words immediately following the introduction of the "stationary" system of [space] coordinates. Einstein says "If we wish to describe the motion of a material point, we give the values of its coordinates as functions of the time". Again this shows that the coordinates here are just the space coordinates, because he's giving them as a function of "the time". He hasn't yet started to refer to "the time" as a coordinate. For readers in 1905 this was perfectly consistent with standard usage. So, by this construal, the first sentence just stipulates an unaccelerated system of space coordinates (using the old naïve and ambiguous definition), without clarifying the requisite time coordinate, but then Einstein immediately acknowledges that, in order to talk about motion (as in Newton’s laws of motion, including the third law which implies isotropic inertia) we need to specify how the time is defined as well. He writes

Now we must bear carefully in mind that a mathematical description of this kind has no physical meaning unless we are quite clear as to what we understand by "time".

Following this he describes what we are to understand by "the time", stipulating that we will choose a time coordinate such that the speed of light is isotropic. He concludes Section 1 by saying "it is essential to have time defined by stationary clocks [with isotropic light speed]... we call it "the time of the stationary system". In summary, Section 1 has defined the suitable systems of coordinates as those with unaccelerated space coordinates and a time coordinate chosen such that the speed of light is isotropic.

Since the first sentence of Section 1 ostensibly serves just to stipulate an unaccelerated system of space coordinates, one might think Einstein should just have referred to Newton’s first law, consistent with how inertial coordinate systems were traditionally defined (up to that time). However, in a confusing irony, the reference to all of Newton’s equations is, in some way, absolutely essential to the argument of the paper, because the full space-time coordinate system constructed in Section 1, including the time coordinate using light speed isotropy, is indeed a system of coordinates in which all of Newton’s equations of mechanics hold good – at least in the low speed limit. (Einstein tacitly assumes that the correct equations of mechanics, in terms of a suitable system of coordinates, reduce to Newton’s equations in the low speed limit.) This is arguably the most important insight of the paper, so it would have been better if Einstein had been more explicit about it. He never acknowledges the operational definition of simultaneity based on isotropy of mechanical inertia implicit in Newton’s third law in the low speed limit, and the identity of this definition with the simultaneity based on isotropic light speed, but the first sentence in Section 1 makes this crucial connection, albeit awkwardly.

Incidentally, the definition of simultaneity based on isotropy of mechanical inertia is sometimes expressed in terms of “slow clock transport”, but that is a poor approach, because a “clock” is a complicated high-level entity. All that’s necessary to synchronize separate clocks using mechanics is the inertial isotropy implied by Newton’s third law (in the low speed limit). For example, if we shoot identical bullets from identical guns at rest at the mid-point of a measuring rod, the bullets arrive at the endpoints simultaneously according to the isotropy of the third law, so they can be used to synchronize clocks. The key insight of special relativity is that synchronization based on the isotropy of mechanical inertia is exactly the same as synchronization by light speed isotropy. This should not be surprising, because all forms of energy (including light, per the photo-electric paper) have inertia, and hence all forms of energy, including light and bullets, satisfy the very same laws of inertia.

In view of this, we see that the conflict between the first sentence of Section 1, asserting Newton’s equations are valid in inertial coordinate systems, could be repaired by restricting it to the low speed limit. Indeed when the paper was reprinted in 1913 (along with some of the other early papers on relativity), a footnote appeared for the key sentence: Consider a coordinate system in which the Newtonian mechanical equations are valid [i.e., to the first approximation]. This footnote has sometimes been attributed to Arnold Sommerfeld, who edited the re-printed papers, but others have questioned the attribution. The extreme brevity of the correction might suggest that it was added by Einstein himself, since authors sometimes strive to minimize their own oversights. In any case, whether Einstein added the note himself or it was added by someone else, Einstein surely approved it. The footnote was carried over into many later re-printings of the paper, and Einstein never objected to it.

Amusingly, Pais tells how Einstein agreed to write out, in long-hand, the entire paper to be auctioned to raise money for the war effort in 1943. (The organizers had hoped to offer the original manuscript, but Einstein told them he threw it away in 1905.) To prepare the handwritten copy, Helen Dukas read the paper aloud, and Einstein wrote it out. At some points he was dissatisfied with the exposition. He stopped and asked “Did I really say that?”, and commented “I could have said that more simply”. It would be interesting to know which version Dukas used as the source, but it seems likely that it was one of the accessible re-prints with the footnote [i.e., to the first approximation] attached to the first sentence of Section 1. The hand-written copy currently resides in the Library of Congress.

Once the footnote is included, another question arises. Had Einstein’s defined “coordinate system” to include the time coordinate, as strictly he should have in order to support the assertion that equations of motion are satisfied, then the first sentence in Section 1, with the footnote, is actually sufficient to fully define the system of inertial (space-time) coordinates. In that case the entire discussion taking up most of Section 1, establishing the time coordinate of the “stationary” system by means of light signals that synchronize separate mutually stationary ideal clocks, is redundant. Einstein describes a pulse of light emitted from a clock at location A when that clock reads t_A, reflecting off a clock at B when that clock reads t_B, and arriving back at A when that clock reads t′_A. To synchronize the clocks he sets t_B to the average of t_A and t′_A. But this is redundant to the mechanical definition. Of course, it’s essential to the argument of the paper that this equivalence between mechanical and electromagnetic synchronization be established, but this is accomplished by the postulates at the beginning of Section 2, which tell us that every inertial coordinate system (space and time), defined as explained in Section 1, is equivalent for the expression of physical laws, and that a pulse of light propagates (in vacuum) at c in terms of such a coordinate system, independent of the motion of the source. Taken together, Einstein argues that this holds not just for his so-called “stationary” system, but for all systems of inertial space and time coordinates. This establishes the identity between mechanical and electromagnetic relativity and avoids the seemingly arbitrary and circular definition of the coordinates based on isotropic light speed followed by the postulate that light speed is isotropic in terms of such coordinate systems.

It’s been frequently noted that Einstein’s light speed postulate (or principle) in the 1905 paper is different than the one commonly used in modern expositions, and even in Einstein’s own later expositions. Modern texts, after first defining inertial coordinate systems (with both space and time coordinates), typically postulate (1) that all such systems are equivalent for the formulation of physical laws, and (2) that light propagates in vacuum in every such system at the fixed speed c. The first postulate is the same as in Einstein’s 1905 paper, but the second is different. In 1905 Einstein chose as a second postulate the proposition (2’) that there exists one particular system of inertial coordinates, which he called the “stationary” system for convenience, in terms of which light propagates in empty space at the fixed speed c, independent of the speed of the source. Superficially this appears to be weaker than (2), but later in the paper Einstein simply asserts that the combination of (1) and (2’) implies (2). However, this implication is not very persuasive, because, as discussed above, it relies on the premise that “empty space” exists and that light moves through it. Furthermore, the claim that (2) follows from (1) and (2’) relies on the premise that “empty space” cannot distinguish between different states of motion, presumably due to the emptiness of it, i.e., being devoid of any sufficient why, if light propagates at a speed independent of the speed of the source in one system in empty space, then by the relativity principle it must propagate at speed c in empty space in every inertial coordinate system. This is not spelled out explicitly in Einstein’s 1905 paper, but later writers recognized that Einstein’s reasoning was a roundabout (and somewhat dubious) way of simply assuming (2).

Already in his 1907 review article Einstein had dropped the “stationary system” of the 1905 presentation, and instead expressed the two postulates as

The principle of relativity: “The physical laws are independent of the state of motion of the reference system, at least if the system is not accelerated.”

The principle of the constancy of the velocity of light: “Clocks [in an inertial coordinate system] can be adjusted in such a way that the propagation velocity of every light ray in vacuum – measured by means of these clocks – becomes everywhere equal to a constant c, provided that the coordinate system is not accelerated.”

So he has already adopted the “modern” presentation, postulating that light propagates at c in every inertial coordinate system, even though he is still using the phrase “coordinate system” to just refer to the space coordinates.

Another point of interest in the 1905 paper is that Einstein does not actually derive the Lorentz covariance of Maxwell’s equations in Part 2 (The Electrodynamical Pert) of the paper. Instead, he stipulates Maxwell’s equations in standard form for the “stationary system”, then applies the Lorentz transformation (deduced in Part 1) to give the transformed versions of Maxwell’s equations, and from this infers what the transformation of the electric and magnetic components must be in order for the transformed equations to have the same form as the original equations. (In doing this, he repeats the reasoning by which we conclude that the scale factor is unity.) This is why Lorentz commented that “Einstein merely assumes what we have derived” about the electromagnetic field. However, Einstein’s axiomatic approach easily gives the correct relativistic form of the current density, which Lorentz had missed in his 1904 paper. Moreover, the discussion of electrodynamics is just one application of the kinematics arising from the recognition that inertial coordinate systems – the natural measures of space and time for any given frame – are related by Lorentz transformations, not Galilean transformations.

Incidentally, people sometimes wonder how we know that Maxwell's equations are not invariant under Galilean transformations for some particular choice for the transformed electric and magnetic field components. This is actually immediate, because Maxwell’s equations in any inertial coordinate systems in which they are valid imply that E and B in vacuum (e.g., for a plane wave) each individually satisfy the wave equation with velocity parameter c in terms of those coordinates, which is manifestly not invariant under Galilean transformation. (One oddity of Einstein’s 1905 presentation is that he first deduces from his postulates that light propagates at c in terms of every system of inertial coordinates, and then performs a consistency check on this fact, which sometimes causes people to confuse the consistency check with the deduction.)

In the concluding section 10 of the paper Einstein analyzes the “dynamics of the slowly accelerated electron”, and develops expressions for the longitudinal and transverse mass of a charged particle (noting that these expressions depend on the definitions of “force” and “acceleration”), and then casually remarks that

These results are also valid for ponderable material particles, because a ponderable material particle can be made into an electron (in our sense of the word) by the addition of an electric charge, no matter how small.

With these words he overturns Newtonian mechanics. He goes on to give the relativistic expression for kinetic energy, showing that the work required to accelerate any material object to the speed of light would be infinite.

Return to MathPages Main Menu