| You are viewing the html version of General Relativity, by Benjamin Crowell. This version is only designed for casual browsing, and may have some formatting problems. For serious reading, you want the printer-friendly Adobe Acrobat version. (c) 1998-2009 Benjamin Crowell, licensed under the Creative Commons Attribution-ShareAlike license. Photo credits are given at the end of the Adobe Acrobat version. |

a / Hendrik Antoon Lorentz (1853-1928)

b / Objects are released at rest at spacetime events P and Q. They remain at rest, and their world-lines define a notion of parallelism.

c / There is no well-defined angular measure in this geometry. In a different frame of reference, the angles are not right angles.

d / Simultaneity is not well defined. The constant-time lines PQ and RS from figure b are not constant-time lines when observed in a different frame of reference.

e / Construction of an affine parameter.

f / Affine geometry gives a well-defined centroid for the triangle.

g / Example 3. The area of the viola can be determined by counting the parallelograms formed by the lattice. The area can be determined to any desired precision, by dividing the parallelograms into fractional parts that are as small as necessary.

h / Example 4.
The geometrical treatment of space, time, and gravity only requires as its basis the equivalence of inertial and gravitational mass. That equivalence holds for Newtonian gravity, so it is indeed possible to redo Newtonian gravity as a theory of curved spacetime. This project was carried out by the French mathematician Cartan, as summarized very readably in section 17.5 of The Road to Reality by Roger Penrose. The geometry of the local reference frames is very simple. The three space dimensions have an approximately Euclidean geometry, and the time dimension is entirely separate from them. This is referred to as a Euclidean spacetime with 3+1 dimensions. Although the outlook is radically different from Newton's, all of the predictions of experimental results are the same.
The experiments in section 1.2 show, however, that there are real, experimentally verifiable violations of Newton's laws. In Newtonian physics, time is supposed to flow at the same rate everywhere, which we have found to be false. The flow of time is actually dependent on the observer's state of motion through space, which shows that the space and time dimensions are intertwined somehow. The geometry of the local frames in relativity therefore must not be as simple as Euclidean 3+1. Their actual geometry was implicit in Einstein's 1905 paper on special relativity, and had already been developed mathematically, without the full physical interpretation, by Hendrik Lorentz. Lorentz's and Einstein's work were explicitly connected by Minkowski in 1907, so a Lorentz frame is often referred to as a Minkowski frame.
To describe this Lorentz geometry, we need to add more structure on top of the axioms O1-O4 of ordered geometry, but it will not be the additional Euclidean structure of E3-E4, it will be something different.
To see how to proceed, let's consider the bare minimum of geometrical apparatus that would be necessary in order to set up frames of reference. The following argument shows that the main missing ingredient is merely a concept of parallelism. We only expect Lorentz frames to be local, but we do need them to be big enough to cover at least some amount of spacetime. If Betty does an Eötvös experiment by releasing a pencil and a lead ball side by side, she is essentially trying to release them at the same event A, so that she can observe them later and determine whether their world-lines stay right on top of one another at point B. That was all that was required for the Eötvös experiment, but in order to set up a Lorentz frame we need to start dealing with objects that are not right on top of one another. Suppose we release two lead balls in two different locations, at rest relative to one another. This could be the first step toward adding measurement to our geometry, since the balls mark two points in space that are separated by a certain distance, like two marks on a ruler, or the goals at the ends of a soccer field. Although the balls are separated by some finite distance, they are still close enough together so that if there is a gravitational field in the area, it is very nearly the same in both locations, and we expect the distance defined by the gap between them to stay the same. Since they are both subject only to gravitational forces, their world-lines are by definition straight lines (geodesics). The goal here is to end up with some kind of coordinate grid defining a (t,x) plane, and on such a grid, the two balls' world-lines are vertical lines. If we release them at events P and Q, then observe them again later at R and S, PQRS should form a rectangle on such a plot. In the figure, the irregularly spaced tick marks along the edges of the rectangle are meant to suggest that although ordered geometry provides us with a well-defined ordering along these lines, we have not yet constructed a complete system of measurement.
The depiction of PQSR as a rectangle, with right angles at its vertices, might lead us to believe that our geometry would have something like the concept of angular measure referred to in Euclid's E4, equality of right angles. But this is too naive even for the Euclidean 3+1 spacetime of Newton and Galileo. Suppose we switch to a frame that is moving relative to the first one, so that the balls are not at rest. In the Euclidean spacetime, time is absolute, so events P and Q would remain simultaneous, and so would R and S; the top and bottom edges PQ and RS would remain horizontal on the plot, but the balls' world-lines PR and QS would become slanted. The result would be a parallelogram. Since observers in different states of motion do not agree on what constitutes a right angle, the concept of angular measure is clearly not going to be useful here. Similarly, if Euclid had observed that a right angle drawn on a piece of paper no longer appeared to be a right angle when the paper was turned around, he would never have decided that angular measure was important enough to be enshrined in E4.
In the context of relativity, where time is not absolute, there is not even any reason to believe that different observers must agree on the simultaneity of PQ and RS. Our observation that time flows differently depending on the observer's state of motion tells us specifically to expect this not to happen when we switch to a frame moving to the relative one. Thus in general we expect that PQRS will be distorted into a form like the one shown in figure d. We do expect, however, that it will remain a parallelogram; a Lorentz frame is one in which the gravitational field, if any, is constant, so the properties of spacetime are uniform, and by symmetry the new frame should still have PR=QS and PQ=RS.
With this motivation, we form the system of affine geometry by adding the following axioms to set O1-O4.1 The notation [PQRS] means that events P, Q, S, and R form a parallelogram, and is defined as the statement that the lines determined by PQ and RS never meet at a point, and similarly for PR and QS.
The following theorem is a stronger version of Playfair's axiom E5, the interpretation being that affine geometry describes a spacetime that is locally flat.
Theorem: Given any line
and any point P not on the line, there exists a unique line through P that is parallel
to
.
This is stronger than E5, which only guarantees uniqueness, not existence. Informally, the idea here is that A1 guarantees the existence of the parallel, and A3 makes it unique.2
Although these new axioms do nothing more than to introduce the concept of parallelism lacking in ordered
geometry, it turns out that they also allow us to build up a concept of measurement. Let
be
a line, and suppose we want to define a number system on this line that measures how far apart events are.
Depending on the type of line, this could be a measurement of time, of spatial distance, or a mixture of the
two. First we arbitrarily single out two distinct points on
and label them 0 and 1. Next, pick some
auxiliary point q0 not lying on
. By A1, construct the parallelogram 01q0q1.
Next construct q01q12. Continuing in this way, we have a scaffolding of parallelograms adjacent
to the line, determining an infinite lattice of points 1, 2, 3, ... on the line, which
represent the positive integers. Fractions can be defined in a similar way. For example,
is defined
as the the point such that when the initial lattice segment
is extended by the same construction, the next point
on the lattice is 1.
The continuously varying variable constructed in this way is called an affine parameter. The time measured by a free-falling clock is an example of an affine parameter, as is the distance measured by the tick marks on a free-falling ruler. Since light rays travel along geodesics, the wave crests on a light wave can even be used analogously to the ruler's tick marks.
The affine parameter can be used to define the centroid of a set of points. In the simplest
example, finding the centroid of two points, we simply bisect the line segment as described above in the
construction of the number
. Similarly, the centroid of a triangle can be defined as the
intersection of its three medians, the lines joining each vertex to the midpoint of the opposite side.
In nonrelativistic mechanics, the concept of the center of mass is closely related to the law of conservation of momentum. For example, a logically complete statement of the law is that if a system of particles is not subjected to any external force, and we pick a frame in which its center of mass is initially at rest, then its center of mass remains at rest in that frame. Since centroids are well defined in affine geometry, and Lorentz frames have affine properties, we have grounds to hope that it might be possible to generalize the definition of momentum relativistically so that the generalized version is conserved in a Lorentz frame. On the other hand, we don't expect to be able to define anything like a global Lorentz frame for the entire universe, so there is no such natural expectation of being able to define a global principle of conservation of momentum. This is an example of a general fact about relativity, which is that conservation laws are difficult or impossible to formulate globally.
Although the affine parameter gives us a system of measurement for free in a geometry whose axioms do not even explicitly mention measurement, there are some restrictions:
We will eventually want to lift some of these restrictions by adding to our kit a tool called a metric, which allows us to define distances along arbitrary curves in space time, and to compare distances in different directions. The affine parameter, however, will not be entirely superseded. In particular, we'll find that the metric has a couple of properties that are not as nice as those of the affine parameter. The square of a metric distance can be negative, and the metric distance measured along a light ray is precisely zero, which is not very useful.
Self-check: By the construction of the affine parameter above, affine distances on the same line are comparable. By another construction, verify the claim made above that this can be extended to distances measured along two different parallel lines.
Self-check: If multiplication is defined in terms of affine area, prove the commutative property ab=ba and the distributive rule a(b+c)=ab+bc from axioms A1-A3.

a / Two objects at rest have world-lines that define a rectangle. In a second frame of reference in motion relative to the first one, the rectangle becomes a parallelogram.

c / Unit square PQRS is Lorentz-boosted to the parallelogram P'Q'R'S'.

e / Example 5. Flashes of light travel along P'T' and Q'T'. The observer in this frame of reference judges them to have been emitted at different times, and to have traveled different distances.

g / Muons accelerated to nearly c undergo radioactive decay much more slowly than they would according to an observer at rest with respect to the muons. The first two data-points (unfilled circles) were subject to large systematic errors.

h / The change in the frequency of x-ray photons emitted by 57Fe as a function of temperature, drawn after Pound And Rebka (1960). Dots are experimental measurements. The solid curve is Pound and Rebka's theoretical calculation using the Debye theory of the lattice vibrations with a Debye temperature of 420 degrees C. The dashed line is one with the slope calculated in the text using a simplified treatment of the thermodynamics. There is an arbitrary vertical offset in the experimental data, as well as the theoretical curves.

i / The light cone in 2+1 dimensions.

j / The circle plays a privileged role in Euclidean geometry. When rotated, it stays the same. The pie slice is not invariant as the circle is. A similar privileged place is occupied by the light cone in Lorentz geometry. Under a Lorentz boost, the spacetime parallelograms change, but the light cone doesn't.
We now want to pin down the properties of the Lorentz geometry that are left unspecified by the affine treatment. This can approached either by looking for an appropriate metric, or by finding the appropriate rules for distorting parallelograms when switching from one frame of reference to another frame is in motion relative to the first. In either case, we need some further input from experiments in order to show us how to proceed. We take the following as empirical facts about flat spacetime:3
Define affine parameters t and x for time and position, and construct a (t,x) plane. Although affine geometry treats all directions symmetrically, we're going beyond the affine aspects of the space, and t does play a different role than x here, as shown, for example, by L4 and L5.
In the (t,x) plane, consider a rectangle with one corner at the origin O. We can imagine its right and left edges as representing the world-lines of two objects that are both initially at rest in this frame; they remain at rest (L2), so the right and left edges are parallel.
We now define a second frame of reference such that the origins of the two frames coincide, but they are in motion relative to one another with velocity v. The transformation L from the first frame to the second is referred to as a Lorentz boost with velocity v. L depends on v.
By homogeneity of spacetime (L1), L must be linear, so the original rectangle will be transformed into a parallelogram in the new frame; this is also consistent with L3, which requires that the world-lines on the right and left edges remain parallel. The left edge has inverse slope v. By L5 (no simultaneity), the top and bottom edges are no longer horizontal.
For simplicity, let the original rectangle have unit area. Then the area of the new parallelogram is still 1, by the following argument. Let the new area be A, which is a function of v. By isotropy of spacetime (L1), A(v)=A(-v). Furthermore, the function A(v) must have some universal form for all geometrical figures, not just for a figure that is initially a particular rectangle; this follows because of our definition of affine area in terms of a dissection by a two-dimensional lattice, which we can choose to be a lattice of squares. Applying boosts +v and -v one after another results in a transformation back into our original frame of reference, and since A is universal for all shapes, it doesn't matter that the second transformation starts from a parallelogram rather than a square. Scaling the area once by A(v) and again by A(-v) must therefore give back the original square with its original unit area, A(v)A(-v)=1, and since A(v)=A(-v), A(v)=±1 for any value of v. Since A(0)=1, we must have A(v)=1 for all v. The argument is independent of the shape of the region, so we conclude that all areas are preserved by Lorentz boosts. The argument is also purely one about affine geometry (it would apply equally well to a Euclidean space), so there is no reason to expect the area A in the (t,x) plane to have any special physical significance in relativity; it is simply a useful mathematical tool in the present discussion.
If we consider a boost by an infinitesimal velocity d v, then the vanishing change in area comes from the sum of the areas of the four infinitesimally thin slivers where the rectangle lies either outside the parallelogram (call this negative area) or inside it (positive). (We don't worry about what happens near the corners, because such effects are of order d v2.) In other words, area flows around in the x-t plane, and the flows in and out of the rectangle must cancel. Let v be positive; the flow at the sides of the rectangle is then to the right. The flows through the top and bottom cannot be in opposite directions (one up, one down) while maintaining the parallelism of the opposite sides, so we have the following three possible cases:
b / Flows of area: (I) a shear that preserves simultaneity, (II) a rotation, (III) upward flow at all edges.
Only case III is possible, and given case III, there must be at least one point P in the first quadrant where area flows neither clockwise nor counterclockwise.6 The boost simply increases P's distance from the origin by some factor. By the linearity of the transformation, the entire line running through O and P is simply rescaled. This special line's inverse slope, which has units of velocity, apparently has some special significance, so we give it a name, c. We'll see later that c is the maximum speed of cause and effect whose existence we inferred in section 1.3. Any world-line with a velocity equal to c retains the same velocity as judged by moving observers, and by isotropy the same must be true for -c.
For convenience, let's adopt time and space units in which c=1, and let the original rectangle be a unit square. The upper right tip of the parallelogram must slide along the line through the origin with slope +1, and similarly the parallelogram's other diagonal must have a slope of -1. Since these diagonals bisected one another on the original square, and since bisection is an affine property that is preserved when we change frames of reference, the parallelogram must be equilateral.
We can now determine the complete form of the Lorentz transformation. Let unit square PQRS, as described above, be transformed to parallelogram P'Q'R'S' in the new coordinate system (x',t'). Let the t' coordinate of R' be γ, interpreted as the ratio between the time elapsed on a clock moving from P' to R' and the corresponding time as measured by a clock that is at rest in the (x',t') frame. By the definition of v, R' has coordinates (vγ,γ), and the other geometrical facts established above place Q' symmetrically on the other side of the diagonal, at (γ,vγ). Computing the cross product of vectors P'R' and P'Q', we find the area of P'Q'R'S' to be γ2(1-v2), and setting this equal to 1 gives

Self-check: Interpret the dependence of γ on the sign of v.

d / The behavior of the γ factor.
The result for the transformation L, a Lorentz boost along the x axis with velocity v, is:

The symmetry of P'Q'R'S' with respect to reflection across the diagonal indicates that the time and space dimensions are treated symmetrically, although they are not entirely interchangeable as they would have been in case II. Although we defined γ in terms of the time coordinate of R', we could just as easily have used the spatial coordinate of Q', so γ represents a factor of both time dilation and length contraction. (Clearly it wouldn't have made sense to distort one quantity without distorting the other, since the invariant velocity c represents a ratio of a distance to a time.) In summary, a clock runs fastest according to an observer who is at rest relative to the clock, and a measuring rod likewise appears longest in its own rest frame.
The lack of a universal notion of simultaneity has a similarly symmetric interpretation. In prerelativistic physics, points in space have no fixed identity. A brass plaque commemorating a Civil War battle is not at the same location as the battle, according to an observer who perceives the Earth has having been hurtling through space for the intervening centuries. By symmetry, points in time have no fixed identity either.
In everyday life, we don't notice relativistic effects like time dilation, so apparently γ≈1, and v << 1, i.e., the speed c must be very large when expressed in meters per second. By setting c equal to 1, we have chosen a the distance unit that is extremely long in proportion to the time unit. This is an example of the correspondence principle, which states that when a new physical theory, such as relativity, replaces an old one, such as Galilean relativity, it must remain “backward-compatible” with all the experiments that verified the old theory; that is, it must agree with the old theory in the appropriate limit. Despite my coyness, you probably know that the speed of light is also equal to c. It is important to emphasize, however, that light plays no special role in relativity, nor was it necessary to assume the constancy of the speed of light in order to derive the Lorentz transformation; we will in fact prove on page 54 that photons must travel at c, and on page 107 that this must be true for any massless particle.
On the other hand, Einstein did originally develop relativity based on a different set of assumptions than our L1-L5. His treatment, given in his 1905 paper “On the electrodynamics of moving bodies,” is reproduced on p. 263. It starts from the following two postulates:
Einstein's P1 is essentially the same as our L3 (equivalence of inertial frames). He implicitly assumes something equivalent to our L1 (homogeneity and isotropy of spacetime). In his system, our L5 (no simultaneity) is a theorem proved from the axioms P1-P2, whereas in our system, his P2 is a theorem proved from the axioms L1-L5.
Let the intersection of the parallelogram's two diagonals be T in the original (rest) frame, and T' in the Lorentz-boosted frame. An observer at T in the original frame simultaneously detects the passing by of the two flashes of light emitted at P and Q, and since she is positioned at the midpoint of the diagram in space, she infers that P and Q were simultaneous. Since the arrival of both flashes of light at the same point in spacetime is a concrete event, an observer in the Lorentz-boosted frame must agree on their simultaneous arrival. (Simultaneity is well defined as long as no spatial separation is involved.) But the distances traveled by the two flashes in the boosted frame are unequal, and since the speed of light is the same in all cases, the boosted observer infers that they were not emitted simultaneously.
A different kind of symmetry is the symmetry between observers. If observer A says observer B's time is slow, shouldn't B say that A's time is fast? This is what would happen if B took a pill that slowed down all his thought processes: to him, the rest of the world would seem faster than normal. But this can't be correct for Lorentz boosts, because it would introduce an asymmetry between observers. There is no preferred, “correct” frame corresponding to the observer who didn't take a pill; either observer can correctly consider himself to be the one who is at rest. It may seem paradoxical that each observer could think that the other was the slow one, but the paradox evaporates when we consider the methods available to A and B for resolving the controversy. They can either (1) send signals back and forth, or (2) get together and compare clocks in person. Signaling doesn't establish one observer as correct and one as incorrect, because as we'll see in the following section, there is a limit to the speed of propagation of signals; either observer ends up being able to explain the other observer's observations by taking into account the finite and changing time required for signals to propagate. Meeting in person requires one or both observers to accelerate, as in the original story of Alice and Betty, and then we are no longer dealing with pure Lorentz frames, which are described by non-accelerating observers.
In the Hafele-Keating experiment using atomic clocks aboard airplanes (p. 15), both gravity and motion had effects on the rate of flow of time. Similarly, both effects must be considered in the case of the GPS system. The gravitational effect was found on page 28 to be Δ E/E=gy (with c=1), based on the equivalence principle. The special-relativistic effect can be found from the Lorentz transformation. Let's determine the directions and relative strengths of the two effects in the case of a GPS satellite.
A radio photon emitted by a GPS satellite gains energy as it falls to the earth's surface, so its energy and frequency are increased by this effect. The observer on the ground, after accounting for all non-relativistic effects such as Doppler shifts and the Sagnac effect (p. 59), would interpret the frequency shift by saying that time aboard the satellite was flowing more quickly than on the ground.
However, the satellite is also moving at orbital speeds, so there is a Lorentz time dilation effect. According to the observer on earth, this causes time aboard the satellite to flow more slowly than on the ground.
We can therefore see that the two effects are of opposite sign. Which is stronger?
For a satellite in low earth orbit, we would have v2/r=g, where r is only slightly greater than the radius of the earth. Expanding the Lorentz gamma factor in a Taylor series, we find that the relative effect on the flow of time is γ-1≈ v2/2 = gr/2. The gravitational effect, approximating g as a constant, is -gy, where y is the satellite's altitude above the earth. For such a satellite, the gravitational effect is down by a factor of 2y/r, so the Lorentz time dilation dominates.
GPS satellites, however, are not in low earth orbit. They orbit at an altitude of about 20,200 km, which is quite a bit
greater than the radius of the earth. We therefore expect the gravitational effect to dominate. To confirm this,
we need to generalize the equation Δ E/E=gy to the case where g is not a constant.
Integrating the equation d E/E = g d y, we find that the time dilation factor is equal to eΔΦ, where
is the gravitational potential per unit mass. When ΔΦ is small, this causes a relative effect
equal to ΔΦ. The total effect for a GPS satellite is thus
(inserting factors of c for calculation with SI units, and using positive signs for blueshifts)

where the first term is gravitational and the second kinematic. A more detailed analysis includes various time-varying effects, but this is the constant part. For this reason, the atomic clocks aboard the satellites are set to a frequency of 10.22999999543 MHz before launching them into orbit; on the average, this is perceived on the ground as 10.23 MHz. A more complete analysis of the general relativity involved in the GPS system can be found in the review article by Ashby.7
Self-check: Suppose that positioning a clock at a certain distance from a certain planet produces a fractional change δ in the rate at which time flows. In other words, the time dilation factor is 1+δ. Now suppose that a second, identical planet is brought into the picture, at an equal distance from the clock. The clock is positioned on the line joining the two planets' centers, so that the gravitational field it experiences is zero. Is the fractional time dilation now approximately 0, or approximately 2δ? Why is this only an approximation?

f / Apparatus used for the test of relativistic time dilation described in example 8. The prominent black and white blocks are large magnets surrounding a circular pipe with a vacuum inside.
(c) 1974 by CERN.
Muons were produced by an accelerator at CERN, near Geneva. A muon is essentially a heavier version of the electron. Muons undergo radioactive decay, lasting an average of only 2.197 μs before they evaporate into an electron and two neutrinos. The 1974 experiment was actually built in order to measure the magnetic properties of muons, but it produced a high-precision test of time dilation as a byproduct. Because muons have the same electric charge as electrons, they can be trapped using magnetic fields. Muons were injected into the ring shown in figure f, circling around it until they underwent radioactive decay. At the speed at which these muons were traveling, they had γ=29.33, so on the average they lasted 29.33 times longer than the normal lifetime. In other words, they were like tiny alarm clocks that self-destructed at a randomly selected time. Figure g shows the number of radioactive decays counted, as a function of the time elapsed after a given stream of muons was injected into the storage ring. The two dashed lines show the rates of decay predicted with and without relativity. The relativistic line is the one that agrees with experiment.
In Pound and Rebka's paper describing their experiment,9 they refer to a preliminary measurement10 in which they carefully measured this effect, showed that it was consistent with theory, and pointed out that a previous claim by Cranshaw et al. of having measured the gravitational frequency shift was vitiated by their failure to control for the temperature dependence.
It turns out that the full Debye treatment of the lattice vibrations is not really necessary near room temperature, so we'll simplify the thermodynamics. At absolute temperature T, the mean translational kinetic energy of each iron nucleus is (3/2)kT. The velocity is much less than c(=1), so we can use the nonrelativistic expression for kinetic energy, K=(1/2)mv2, which gives a mean value for v2 of 3kT/m. In the limit of v << 1, time dilation produces a change in frequency by a factor of 1/γ, which differs from unity by approximately -v2/2. The relative time dilation is therefore -3kT/2m, or, in metric units, -3kT/2mc2. The vertical scale in figure h contains an arbitrary offset, since Pound and Rebka's measurements were the best absolute measurements to date of the frequency. The predicted slope of -3k/2mc2, however, is not arbitrary. Plugging in 57 atomic mass units for m, we find the slope to be 2.4×10-15, which, as shown in the figure is an excellent approximation (off by only 10%) near room temperature.

a / Light cones tip over for two reasons in general relativity: because of the presence of masses, which have gravitational fields, and because of the cosmological constant. The time and distance scales in the bottom figure are many orders of magnitude greater than those in the top.

b / Example 10. Matter is lifted out of a Newtonian black hole with a bucket. The dashed line represents the point at which the escape velocity equals the speed of light.
Given an event P, we can now classify all the causal relationships in which P can participate. In Newtonian physics, these relationships fell into two classes: P could potentially cause any event that lay in its future, and could have been caused by any event in its past. In a Lorentz spacetime, we have a trichotomy rather than a dichotomy. There is a third class of events that are too far away from P in space, and too close in time, to allow any cause and effect relationship, since causality's maximum velocity is c. Since we're working in units in which c=1, the boundary of this set is formed by the lines with slope ±1 on a (t,x) plot. This is referred to as the light cone, and in the generalization from 1+1 to 3+1 dimensions, it literally becomes a (four-dimensional) cone. The terminology comes from the fact that light happens to travel at c, the maximum speed of cause and effect. If we make a cut through the cone defined by a surface of constant time in P's future, the resulting section is a sphere (analogous to the circle formed by cutting a three-dimensional cone), and this sphere is interpreted as the set of events on which P could have had a causal effect by radiating a light pulse outward in all directions.
Events lying inside one another's light cones are said to have a timelike relationship. Events outside each other's light cones are spacelike in relation to one another, and in the case where they lie on the surfaces of each other's light cones the term is lightlike.
The light cone plays the same role in the Lorentz geometry that the circle plays in Euclidean geometry. The truth or falsehood of propositions in Euclidean geometry remains the same regardless of how we rotate the figures, and this is expressed by Euclid's E3 asserting the existence of circles, which remain invariant under rotation. Similarly, Lorentz boosts preserve light cones and truth of propositions in a Lorentz frame.
Self-check: Under what circumstances is the time-ordering of events P and Q preserved under a Lorentz boost?
In a uniform Lorentz spacetime, all the light cones line up like soldiers with their axes parallel with one another. When gravity is present, however, this uniformity is disturbed in the vicinity of the masses that constitute the sources. The light cones lying near the sources tip toward the sources. Superimposed on top of this gravitational tipping together, recent observations have demonstrated a systematic tipping-apart effect which becomes significant on cosmological distance scales. The parameter Λ that sets the strength of this effect is known as the cosmological constant. The cosmological constant is not related to the presence of any sources (such as negative masses), and can be interpreted instead as a tendency for space to expand over time on its own initiative. In the present era, the cosmological constant has overpowered the gravitation of the universe's mass, causing the expansion of the universe to accelerate.
Self-check: In the bottom panel of figure a, can an observer look at the properties of the spacetime in her immediate vicinity and tell how much her light cones are tipping, and in which direction? Compare with figure g on page 25.
Imagine a black hole from a Newtonian point of view, as proposed in 1783 by geologist John Michell. Setting the escape velocity equal to the speed of light, we find that this will occur for any gravitating spherical body compact enough to have M/r>c2/2G. (A fully relativistic argument, as given in section 6.2, agrees on M/r ∝ c2/G, which is fixed by units. The correct unitless factor depends on the definition of r, which is flexible in general relativity.) A flash of light emitted from the surface of such a Newtonian black hole would fall back down like water from a fountain, but it would nevertheless be possible for physical objects to escape, e.g., if they were lifted out in a bucket dangling from a cable. If the cable is to support its own weight, it must have a tensile strength per unit density of at least c2/2, which is about ten orders of magnitude greater than that of carbon nanotube fibers. (The factor of 1/2 is not to be taken seriously, since it comes from a nonrelativistic calculation.)
The cause-and-effect interpretation of relativity tells us that this Newtonian picture is incorrect. A physical object that approaches to within a distance r of a concentration of mass M, with M/r sufficiently large, has no causal future lying at larger values of r. The conclusion is that there is a limit on the tensile strength of any substance, imposed purely by general relativity, and we can state this limit without having to know anything about the physical nature of the interatomic forces. Cf. homework problem 3 and section 3.4.4, as well as some references given in the remark following problem 3.
In classical physics, velocities add in relative motion. For example, if a boat moves relative to a river, and the river moves relative to the land, then the boat's velocity relative to the land is found by vector addition. This linear behavior cannot hold relativistically. For example, if a spaceship is moving at 0.60c relative to the earth, and it launches a probe at 0.60c relative to itself, we can't have the probe moving at 1.20c relative to the earth, because this would be greater than the maximum speed of cause and effect, c. To see how to add velocities relativistically, we start be rewriting the Lorentz transformation as the matrix \left( \begin{array}{cc} \cosh\eta & \sinh\eta \ \sinh\eta & \cosh\eta \end{array} \right) , where η=tanh-1 v is called the rapidity. We are guaranteed that the matrix can be written in this form, because its area-preserving property says that the determinant equals 1, and cosh2η-sinh2η=1 is an identity of the hyperbolic trig functions. It is now straightforward to verify that multiplication of two matrices of this form gives a third matrix that is also of this form, with η=η1+η2. In other words, rapidities add linearly; velocities don't. In the example of the spaceship and the probe, the rapidities add as tanh-1.60+tanh-1.60=.693+.693=1.386, giving the probe a velocity of tanh 1.386=0.88 relative to the earth. Any number of velocities can be added in this way, η1+η2+…+ηn.
Self-check: Interpret the asymptotes of the graph in figure c.
Let spaceships A and B accelerate as shown in figure d along a straight line. Observer C does not accelerate. The accelerations, as judged by C, are constant for both ships. Each ship is equipped with a yard-arm, and a thread is tied between the two arms. Does the thread break, due to Lorentz contraction? (We assume that the acceleration is gentle enough that the thread does not break simply because of its own inertia.)
The popular answer in the CERN cafeteria was that the thread would not break, the reasoning being that Lorentz contraction is a frame-dependent effect, and no such contraction would be observed in A and B's frame. The ships maintain a constant distance from one another, so C merely disagrees with A and B about the length of the thread, as well as other lengths like the lengths of the spaceships.
The error in this reasoning is that the accelerations of A and B were specified to be equal and constant in C's frame, not in A and B's. Bell's interpretation is that the frame-dependence is a distraction, that Lorentz contraction is in some sense a real effect, and that it is therefore immediately clear that the thread must break, without even having to bother going into any other frame. To convince his peers in the cafeteria, however, Bell presumably needed to satisfy them as to the specific errors in their reasoning, and this requires that we consider the frame-dependence explicitly.
We can first see that it is impossible, in general, for different observers to agree about what is meant by constant acceleration. Suppose that A and B agree with C about the constancy of their acceleration. Then A and B experience a voyage in which the rapidities of the stars around them (and of observer C) increase linearly with time. As the rapidity approaches infinity, both C and the stars approach the speed of light. But since A and C agree on the magnitude of their velocity relative to one another, this means that A's velocity as measured by C must approach c, and this contradicts the premise that C observes constant acceleration for both ships. Therefore A and B do not consider their own accelerations to be constant.
A and B do not agree with C about simultaneity, and since they also do not agree that their accelerations are constant, they do not consider their own accelerations to be equal at a given moment of time. Therefore the string changes its length, and this is consistent with Bell's original, simple answer, which did not require comparing different frames of reference. To establish that the string comes under tension, rather than going slack, we can apply the equivalence principle. By the equivalence principle, any experiments done by A and B give the same results as if they were immersed in a gravitational field. The leading ship B sees A as experiencing a gravitational time dilation. According to B, the slowpoke A isn't accelerating as rapidly as it should, causing the string to break.
These ideas are closely related to the fact that general relativity does not admit any spacetime that can be interpreted as a uniform gravitational field (see problem 5, p. 162).
The trichotomous classification of causal relationships
has interesting logical implications. In classical Aristotelian logic,
every proposition is either true or false, but not both, and given propositions p and q, we
can form propositions such as
(both p and q) or
(either p or q).
Propositions about physical phenomena can only be verified by observation. Let p be
the statement that a certain observation carried out at event P gives a certain result, and similarly for
q at Q. If PQ is spacelike, then the truth or falsehood of
cannot be checked by physically
traveling to P and Q, because no observer would be able to attend both events. The truth-value of
is unknown to any observer in the universe until a certain time, at which the relevant information has
been able to propagate back and forth. What if P and Q lie inside two different black holes? Then the truth-value
of
can never be determined by any observer. Another example is the case in which P and Q
are separated by such a great distance that, due to the accelerating expansion of the universe, their future
light cones do not overlap.
We conclude that Aristotelian logic cannot be appropriately applied to relativistic observation in this way. Some workers attempting to construct a quantum-mechanical theory of gravity have suggested an even more radically observer-dependent logic, in which different observers may contradict one another on the truth-value of a single proposition p1, unless they agree in advance on the list p2, p3, ... of all the other propositions that they intend to test as well. We'll return to these questions on page 190.
We've already seen, in section 1.2, a variety of evidence for the non-classical behavior of spacetime. We're now in a position to discuss tests of relativity more quantitatively. An up-to-date review of such tests is given by Mattingly.11
One such test is that relativity requires the speed of light to be the same in all frames of reference, for the following reasons. Compare with the speed of sound in air. The speed of sound is not the same in all frames of reference, because the wave propagates at a fixed speed relative to the air. An observer at who is moving relative to the air will measure a different speed of sound. Light, on the other hand, isn't a vibration of any physical medium. Maxwell's equations predict a definite value for the speed of light, regardless of the motion of the source. This speed also can't be relative to any medium. If the speed of light isn't fixed relative to the source, and isn't fixed relative to a medium, then it must be fixed relative to anything at all. The only speed in relativity that is equal in all frames of reference is c, so light must propagate at c. We will see on page 107 that there is a deeper reason for this; relativity requires that any massless particle propagate at c. The requirement of v=c for massless particles is so intimately hard-wired into the structure of relativity that any violation of it, no matter how tiny, would be of great interest. Essentially, such a violation would disprove Lorentz invariance, i.e., the invariance of the laws of physics under Lorentz transformations. There are two types of tests we could do: (1) test whether photons of all energies travel at the same speed, i.e., whether the vacuum is dispersive; (2) test whether observers in all frames of reference measure the same speed of light.
Some candidate quantum-mechanical theories of gravity, such as loop quantum gravity,
predict a granular structure for spacetime at the Planck scale,
m, which would naturally lead to
deviations from v=1 that would become more and more significant for photons with wavelengths
getting closer and closer to that scale. Lorentz-invariance would then be an approximation valid
only at large scales.
Presently the best experimental tests of the invariance of the speed of light with respect to wavelength come from astronomical observations of gamma-ray bursts, which are sudden outpourings of high-energy photons, believed to originate from a supernova explosion in another galaxy. One such observation, in 2009,12 collected photons from such a burst, with a duration of 2 seconds, indicating that the propagation time of all the photons differed by no more than 2 seconds out of a total time in flight on the order of ten billion years, or about one part in 1017! A single superlative photon in the burst had an energy of 31 GeV, and its arrival within the same 2-second time window demonstrates Lorentz invariance over a vast range of photon energies, ruling out some versions of loop quantum gravity.
The constancy of the speed of light for observers in all frames of reference was originally detected in 1887 when Michelson and Morley set up a clever apparatus to measure any difference in the speed of light beams traveling east-west and north-south. The motion of the earth around the sun at 110,000 km/hour (about 0.01% of the speed of light) is to our west during the day. Michelson and Morley believed that light was a vibration of a physical medium, the ether, so they expected that the speed of light would be a fixed value relative to the ether. As the earth moved through the ether, they thought they would observe an effect on the velocity of light along an east-west line. For instance, if they released a beam of light in a westward direction during the day, they expected that it would move away from them at less than the normal speed because the earth was chasing it through the ether. They were surprised when they found that the expected 0.01% change in the speed of light did not occur.

b / The Michelson-Morley experiment, shown in photographs, and drawings from the original 1887 paper. 1. A simplified drawing of the apparatus. A beam of light from the source, s, is partially reflected and partially transmitted by the half-silvered mirror h1. The two half-intensity parts of the beam are reflected by the mirrors at a and b, reunited, and observed in the telescope, t. If the earth's surface was supposed to be moving through the ether, then the times taken by the two light waves to pass through the moving ether would be unequal, and the resulting time lag would be detectable by observing the interference between the waves when they were reunited. 2. In the real apparatus, the light beams were reflected multiple times. The effective length of each arm was increased to 11 meters, which greatly improved its sensitivity to the small expected difference in the speed of light. 3. In an earlier version of the experiment, they had run into problems with its “extreme sensitiveness to vibration,” which was “so great that it was impossible to see the interference fringes except at brief intervals ... even at two o'clock in the morning.” They therefore mounted the whole thing on a massive stone floating in a pool of mercury, which also made it possible to rotate it easily. 4. A photo of the apparatus. Note that it is underground, in a room with solid brick walls.
Although the Michelson-Morley experiment was nearly two decades in the past by the time Einstein published his first paper on relativity in 1905, and Einstein did know about it,13 it's unclear how much it influenced him. Michelson and Morley themselves were uncertain about whether the result was to be trusted, or whether systematic and random errors were masking a real effect from the ether. There were a variety of competing theories, each of which could claim some support from the shaky data. Some physicists believed that the ether could be dragged along by matter moving through it, which inspired variations on the experiment that were conducted on mountaintops in thin-walled buildings, (figure), or with one arm of the apparatus out in the open, and the other surrounded by massive lead walls. In the standard sanitized textbook version of the history of science, every scientist does his experiments without any preconceived notions about the truth, and any disagreement is quickly settled by a definitive experiment. In reality, this period of confusion about the Michelson-Morley experiment lasted for four decades, and a few reputable skeptics, including Miller, continued to believe that Einstein was wrong, and kept trying different variations of the experiment as late as the 1920's. Most of the remaining doubters were convinced by an extremely precise version of the experiment performed by Joos in 1930, although you can still find kooks on the internet who insist that Miller was right, and that there was a vast conspiracy to cover up his results.
c / Dayton Miller thought that the result of the Michelson-Morley experiment could be explained because the ether had been pulled along by the dirt, and the walls of the laboratory. This motivated him to carry out a series of experiments at the top of Mount Wilson, in a building with thin walls.
Before Einstein, some physicists who did believe the negative result of the Michelson-Morley experiment came up with explanations that preserved the ether. In the period from 1889 to 1895, both Lorentz and George FitzGerald suggested that the negative result of the Michelson-Morley experiment could be explained if the earth, and every physical object on its surface, was contracted slightly by the strain of the earth's motion through the ether. Thus although Lorentz developed all the mathematics of Lorentz frames, and got them named after himself, he got the interpretation wrong.
d / The results of the measurement of g by Chung et al., section 2.4.3. The experiment was done on the Stanford University campus, surrounded by the Pacific ocean and San Francisco Bay, so it was subject to varying gravitational from both astronomical bodies and the rising and falling ocean tides. Once both of these effects are subtracted out of the data, there is no Lorentz-violating variation in g due to the earth's motion through space. Note that the data are broken up into three periods, with gaps of three months and four years separating them. (c) APS, used under the U.S. fair use exception to copyright.

e / The matter interferometer used by Chung et al. Each atom's wavefunction is split into two parts, which travel along two different paths (solid and dashed lines).
The tests described in sections 2.4.1 and 2.4.2 both involve the behavior of light, i.e., they test whether or not electromagnetism really has the exact Lorentz-invariant behavior contained implicitly in Maxwell's equations. In the jargon of the field, they test Lorentz invariance in the “photon sector.” Since relativity is a theory of gravity, it is natural to ask whether the Lorentz invariance holds for gravitational forces as well as electromagnetic ones. If Lorentz invariance is violated by gravity, then the strength of gravitational forces might depend on the observer's motion through space, relative to some fixed reference frame analogous to that of the ether. Historically, gravitational Lorentz violations have been much more difficult to test, since gravitational forces are so weak, and the first high-precision data were obtained by Nordtvedt and Will in 1957, 70 years after Michelson and Morley. Nordtvedt and Will measured the strength of the earth's gravitational field as a function of time, and found that it did not vary on a 24-hour cycle with the earth's rotation, once tidal effects had been accounted for. Further constraints come from data on the moon's orbit obtained by reflecting laser beams from a mirror left behind by the Apollo astronauts.
A recent high-precision laboratory experiment was done in 2009 by Chung et al.14 They constructed an interferometer in a vertical plane that is conceptually similar to a Michelson interferometer, except that it uses cesium atoms rather than photons. That is, the light waves of the Michelson-Morley experiment are replaced by quantum-mechanical matter waves. The roles of the half-silvered and fully silvered mirrors are filled by lasers, which kick the atoms electromagnetically. Each atom's wavefunction is split into two parts, which travel by two different paths through spacetime, eventually reuniting and interfering. The result is a measurement of g to about one part per billion. The results, shown in figure d, put a strict limit on violations of Lorentz geometry by gravity.
New and nontrivial phenomena arise when we generalize from 1+1 dimensions to 3+1.

a / A boost along x followed by a boost along y results in tangling up of the x and y coordinates, so the result is not just a boost but a boost plus a rotation.
How does a Lorentz boost along one axis, say x, affect the other two spatial coordinates y and z?
We have
already proved that area in the (t,x) plane is preserved. The same proof applies to volume in the spaces
(t,x,y) and (t,x,z), hence lengths in the y and z directions are preserved. (The proof does not apply to
volume in, e.g., (x,y,z) space, because the x transformation depends on t, and therefore if we are given
a region in (x,y,z), we do not have enough information to say how it will change under a Lorentz boost.)
The complete form of the transformation
, a Lorentz boost along the x axis with velocity v, is therefore:




Based on the trivial nature of this generalization, it might seem as though no qualitatively new considerations would arise in 3+1 dimensions as compared with 1+1. To see that this is not the case, consider figure a. A boost along the x axis tangles up the x and t coordinates. A y-boost mingles y and t. Therefore consecutive boosts along x and y can cause x and y to mix. The result, as we'll see in more detail below, is that two consecutive boosts along non-collinear axes are not equivalent to a single boost; they are equivalent to a boost plus a spatial rotation. The remainder of this section discusses this effect, known as Thomas precession, in more detail; it can be omitted on a first reading.
Self-check: Apply similar reasoning to a Galilean boost.

b / Inertial devices for maintaining a direction in space: 1. A ring laser. 2. The photon in a perfectly reflective spherical cavity. 3. A gyroscope.
To see how this mathematical fact would play out as a physical effect, we need to consider how to make a physical manifestation of the concept of a direction in space.
In two space dimensions, we can construct a ring laser, b/1, which in its simplest incarnation is a closed loop of optical fiber with a bidirectional laser inserted in one place. Coherent light traverses the loop simultaneously in both directions, interfering in a beat pattern, which can be observed by sampling the light at some point along the loop's circumference. If the loop is rotated in its own plane, the interference pattern is altered, because the beam-sampling device is in a different place, and the path lengths traveled by the two beams has been altered. This phase shift is called the Sagnac effect, after M. Georges Sagnac, who observed the effect in 1913 and interpreted it, incorrectly, as evidence for the existence of the aether.15 The loop senses its own angular acceleration relative to an inertial reference frame. If we transport the loop while always carefully adjusting its orientation so as to prevent phase shifts, then its orientation has been preserved. The atomic clocks used in the Hafele-Keating atomic-clock experiment described on page 15 were sensitive to Sagnac effects, and it was not practical to maintain their orientations while they were strapped into seats on a passenger jet, so this orientational effect had to be subtracted out of the data at the end of the experiment.
In three spatial dimensions, we could build a spherical cavity with a reflective inner surface, and release a photon inside, b/2.
In reality, the photon-in-a-cavity is not very practical. The photon would eventually be absorbed or scattered, and it would also be difficult to accurately initialize the device and read it out later. A more practical tool is a gyroscope. For example, one of the classic tests of general relativity is the 2007 Gravity Probe B experiment (discussed in detail on pages 139 and 175), in which four gyroscopes aboard a satellite were observed to precess due to special- and general-relativistic effects.
The gyroscope, however, is not so obviously a literal implementation of our basic concept of a direction. How, then, can we be sure that its behavior is equivalent to that of the photon-in-a-cavity? We could, for example, carry out a complete mathematical development of the angular momentum vector in relativity.16 The equivalence principle, however, allows us to bypass such technical details. Suppose that we seal the two devices inside black boxes, with identical external control panels for initializing them and reading them out. We initialize them identically, and then transport them along side-by-side world-lines. Classically, both the mechanical gyroscope and the photon-gyroscope would maintain absolute, fixed directions in space. Relativistically, they will not necessarily maintain their orientations. For example, we've already seen in section 2.5.1 that there are reasons to expect that their orientations will change if they are subjected to accelerations that are not all along the same line. Because relativity is a geometrical theory of spacetime, this difference between the classical and relativistic behavior must be determinable from purely geometrical considerations, such as the shape of the world-line. If it depended on something else, then we could conceivably see a disagreement in the outputs of the two instruments, but this would violate the equivalence principle.
Suppose there were such a discrepancy. That discrepancy would be a physically measurable property of the spacetime region through which the two gyroscopes had been transported. The effect would have a certain magnitude and direction, so by collecting enough data we could map it out as vector field covering that region of spacetime. This field evidently causes material particles to accelerate, since it has an effect on the mechanical gyroscope. Roughly speaking (the reasoning will be filled in more rigorously on page 118), the fact that this field acts differently on the two gyroscopes is like getting a non-null result from an Eötvös experiment, and it therefore violates the equivalence principle. We conclude that gyroscopes b/2 and b/3 are equivalent. In other words, there can only be one uniquely defined notion of direction, and the details of how it is implemented are irrelevant.

c / Classically, the gyroscope should not rotate as long as the forces from the hammer are all transmitted to it at its center of mass.

e / The velocity disk.

f / Two excursions in a rocket-ship: one along the y axis and one along x.

g / A round-trip involving ultrarelativistic velocities. All three legs are at constant acceleration.

h / In the limit where A and B are ultrarelativistic velocities, leg AB is perpendicular to the edge of the velocity disk. The result is that the x-y frame determined by the ship's gyroscopes has rotated by 90 degrees by the time it gets home.

i / If the crack between the two areas is squashed flat, the two pieces of the path on the interior coincide, and their contributions to the precession cancel out (v→-v, but a→ +a, so a×v→ -a×v). Therefore the precession χ obtained by going around the outside is equal to the sum χ1+χ2 of the precessions that would have been obtained by going around the two parts.
As a quantitative example, consider the following thought experiment. Put a gyroscope in a box, and send the box around the square path shown in figure c at constant speed. The gyroscope defines a local coordinate system, which according to classical physics would maintain its orientation. At each corner of the square, the box has its velocity vector changed abruptly, as represented by the hammer. We assume that the hits with the hammer are transmitted to the gyroscope at its center of mass, so that they do not result in any torque. Classically, if the set of gyroscopes travels once around the square, it should end up at the same place and in the same orientation, so that the coordinate system it defines is identical with the original one.
For notation, let
indicate the boost along the x axis described by the transformation
on page 59. This is a transformation that changes to a frame of reference moving in the
negative x direction compared to the original frame.
A particle considered to be at rest in the original frame is described in the new frame as moving in the positive x direction.
Applying such an L to a vector p, we calculate Lp, which gives the coordinates of the event as measured in the
new frame. An expression like MLp is equivalent by associativity to M(Lp), i.e., ML represents applying L first, and then
M.

, changes coordinates measured by the original gyroscope-defined frame to new
coordinates measured by the new gyroscope-defined frame, after the box has been accelerated in the positive y direction.

d / A page from one of Einstein's notebooks.
The calculation of T is messy, and to be honest, I made a series of mistakes when I tried to crank it out by hand. Calculations in relativity have a reputation for being like this. Figure d shows a page from one of Einstein's notebooks, written in fountain pen around 1913. At the bottom of the page, he wrote “zu umstaendlich,” meaning “too involved.” Luckily we live in an era in which this sort of thing can be handled by computers. Starting at this point in the book, I will take appropriate opportunities to demonstrate how to use the free and open-source computer algebra system Maxima to keep complicated calculations manageable. The following Maxima program calculates a particular element of the matrix T.
/* For convenience, define gamma in terms of v: */
gamma:1/sqrt(1-v*v);
/* Define Lx as L(x-hat), Lmx as L(-x-hat), etc.: */
Lx:matrix([gamma, gamma*v, 0],
[gamma*v, gamma, 0],
[0, 0, 1]);
Ly:matrix([gamma, 0, gamma*v],
[0, 1, 0],
[gamma*v, 0, gamma]);
Lmx:matrix([gamma, -gamma*v, 0],
[-gamma*v, gamma, 0],
[0, 0, 1]);
Lmy:matrix([gamma, 0, -gamma*v],
[0, 1, 0],
[-gamma*v, 0, gamma]);
/* Calculate the product of the four matrices: */
T:Lx.Ly.Lmx.Lmy;
/* Define a column vector along the x axis: */
P:matrix([0],[1],[0]);
/* Find the result of T acting on this vector,
expressed as a Taylor series to second order in v: */
taylor(T.P,v,0,2);
Statements are terminated by semicolons, and comments are written like /* ... */
On line 2, we see a symbolic definition of the symbol gamma in terms of the symbol v. The colon
means “is defined as.”
Line 2 does not mean, as it would in most programming languages, to take a stored numerical value of v and use it to calculate a numerical
value of γ. In fact, v does not have a numerical value defined at this point, nor will it ever have a numerical value defined
for it throughout this program. Line 2 simply means that whenever Maxima encounters the symbol gamma, it should take it
as an abbreviation for the symbol 1/sqrt(1-v*v). Lines 5-16 define some 3×3 matrices that represent the L transformations.
The basis is
,
,
.
Line 18 calculates the product of the four matrices; the dots represent matrix multiplication.
Line 23 defines a vector along the x axis, expressed as a column matrix (three rows of one column each)
so that Maxima will know how to operate on it using matrix multiplication by T.
Finally line 26 outputs17 the result of T acting on P:
[ 0 + . . . ]
[ ]
(%o9)/T/ [ 1 + . . . ]
[ ]
[ 2 ]
[ - v + . . . ]
In other words, T\left(\begin{array}{c}0 1 0\end{array}\right) = \left(\begin{array}{c}0 1 -v^2\end{array}\right) + ... , where … represents higher-order terms in v. Suppose that we use the initial frame of reference, before T is applied, to determine that a particular reference point, such as a distant star, is along the x axis. Applying T, we get a new vector TP, which we find has a nonvanishing y component approximately equal to -v2. This result is entirely unexpected classically. It tells us that the gyroscope, rather than maintaining its original orientation as it would have done classically, has rotated slightly. It has precessed in the counterclockwise direction in the x-y plane, so that the direction to the star, as measured in the coordinate system defined by the gyroscope, appears to have rotated clockwise. As the box moved clockwise around the square, the gyroscope has apparently rotated by a counterclockwise angle χ≈ v2 about the z axis. We can see that this is a purely relativistic effect, since for v << 1 the effect is small. For historical reasons discussed in section 2.5.4, this phenomenon is referred to as the Thomas precession.
The particular features of this square geometry are not necessary. I chose them so that (1) the boosts would be along the Cartesian axes, so that we would be able to write them down easily; (2) it is clear that the effect doesn't arise from any asymmetric treatment of the spatial axes; and (3) the change in the orientation of the gyroscope can be measured at the same point in space, e.g., by comparing it with a twin gyroscope that stays at home. In general:
Self-check: If Lorentz boosts did commute, what would be the consequences
for the expression
?
Figure e shows a useful way of visualizing the combined effects of boosts and rotations in 2+1 dimensions. The disk depicts all possible states of motion relative to some arbitrarily chosen frame of reference. Lack of motion is represented by the point at the center. A point at distance v from the center represents motion at velocity v in a particular direction in the x-y plane. By drawing little axes at a particular point, we can represent a particular frame of reference: the frame is in motion at some velocity, with its own x and y axes are oriented in a particular way.
It turns out to be easier to understand the qualitative behavior of our mysterious rotations if we switch from the low-velocity limit to the contrary limit of ultrarelativistic velocities. Suppose we have a rocket-ship with an inertial navigation system consisting of two gyroscopes at right angles to one another. We first accelerate the ship in the y direction, and the acceleration is steady in the sense that it feels constant to observers aboard the ship. Since it is rapidities, not velocities, that add linearly, this means that as an observer aboard the ship reads clock times τ1, τ2, ..., all separated by equal intervals Δτ, the ship's rapidity changes at a constant rate, η1, η2, .... This results in a series of frames of reference that appear closer and closer together on the diagram as the ship approaches the speed of light, at the edge of the disk. We can start over from the center again and repeat the whole process along the x axis, resulting in a similar succession of frames. In both cases, the boosts are being applied along a single line, so that there is no rotation of the x and y axes.
Now suppose that the ship were to accelerate along a route like the one shown in figure g. It first accelerates along the y axis at a constant rate (again, as judged by its own sensors), until its velocity is very close to the speed of light, A. It then accelerates, again at a self-perceived constant rate and with thrust in a fixed direction as judged by its own gyroscopes, until it is moving at the same ultrarelativistic speed in the x direction, B. Finally, it decelerates in the x direction until it is again at rest, O. This motion traces out a clockwise loop on the velocity disk. The motion in space is also clockwise.
We might naively think that the middle leg of the trip, from A to B, would be a straight line on the velocity disk, but this can't be the case. First, we know that non-collinear boosts cause rotations. Traveling around a clockwise path causes counterclockwise rotation, and vice-versa. Therefore an observer in the rest frame O sees the ship (and its gyroscopes) as rotating as it moves from A to B. The ship's trajectory through space is clockwise, so according to O the ship rotates counterclockwise as it goes A to B. The ship is always firing its engines in a fixed direction as judged by its gyroscopes, but according to O the ship is rotating counterclockwise, its thrust is progressively rotating counterclockwise, and therefore its trajectory turns counterclockwise. We conclude that leg AB on the velocity disk is concave, rather than being a straight-line hypotenuse of a triangle OAB.
We can also determine, by the following argument, that leg AB is perpendicular to the edge of the disk where it touches the edge of the disk. In the transformation from frame A to frame O, y coordinates are dilated by a factor of γ, which approaches infinity in the limit we're presently considering. Observers aboard the rocket-ship, occupying frame A, believe that their task is to fire the rocket's engines at an angle of 45 degrees with respect to the y axis, so as to eliminate their velocity with respect to the origin, and simultaneously add an equal amount of velocity in the x direction. This 45-degree angle in frame A, however, is not a 45-degree angle in frame O. From the stern of the ship to its bow we have displacements Δ x and Δ y, and in the transformation from A to O, Δ y is magnified almost infinitely. As perceived in frame O, the ship's orientation is almost exactly antiparallel to the y axis.18
As the ship travels from A to B, its orientation (as judged in frame O) changes from
to
.
This establishes, in a much more direct fashion, the direction of the Thomas precession: its handedness is contrary to the handedness of the direction of motion.
We can now also see something new about the fundamental reason for the effect. It has to do with the fact that observers in
different states of motion disagree on spatial angles. Similarly, imagine that you are a two-dimensional being who was
told about the existence of a new, third, spatial dimension. You have always believed that the cosine of the angle between
two unit vectors u and v is given by the vector dot product uxvx+uyvy. If you were allowed to explore
a two-dimensional projection of a three-dimensional scene, e.g., on the flat screen of a television, it would seem to you
as if all the angles had been distorted. You would have no way to interpret the visual conventions of perspective.
But once you had learned about the existence of a z axis, you would realize that these angular distortions were
happening because of rotations out of the x-y plane. Such rotations really conserve the quantity uxvx+uyvy+uzvz; only
because you were ignoring the uzvz term did it seem that angles were not being preserved. Similarly, the generalization
from three Euclidean spatial dimensions to 3+1-dimensional spacetime means that three-dimensional dot products are no
longer conserved.
Let's find the low-v limit of the Thomas precession in general, not just in the highly artificial special case of χ≈ v2 for the example involving the four hammer hits. To generalize to the case of smooth acceleration, we first note that the rate of precession dχ/d t must have the following properties.
The only rotationally invariant mathematical operation that has these symmetry properties is the vector cross product, so the rate of precession must be ka×v, where k>0 is nearly independent of v and a for small v and a.
To pin down the value of k, we need to find a connection between our two results: χ≈ v2 for the four hammer hits, and dχ/d t≈ ka×v for smooth acceleration. We can do this by considering the physical significance of areas on the velocity disk. As shown in figure i, the rotation χ due to carrying the velocity around the boundary of a region is additive when adjacent regions are joined together. We can therefore find χ for any region by breaking the region down into elements of area d A and integrating their contributions dχ. What is the relationship between d A and dχ? The velocity disk's structure is nonuniform, in the sense that near the edge of the disk, it takes a larger boost to move a small distance. But we're investigating the low-velocity limit, and in the low-velocity region near the center of the disk, the disk's structure is approximately uniform. We therefore expect that there is an approximately constant proportionality factor relating d A and dχ at low velocities. The example of the hammer corresponds geometrically to a square with area v2, so we find that this proportionality factor is unity, d A≈dχ.
To relate this to smooth acceleration, consider a particle performing
circular motion with period T, which has |a×v|=2π v2/T.
Over one full period of the motion, we have
, and the
particle's velocity vector traces a circle of area A=π v2 on the velocity disk. Equating A and χ, we
find k=1/2. The result is that in the limit of low velocities, the rate of rotation is

where
is the angular velocity vector of the rotation.
In the special case of circular motion, this can be written as Ω=(1/2)v2ω, where ω=2π/T is
the angular frequency of the motion.

j / States in hydrogen are labeled with their
and s quantum numbers, representing their orbital and spin angular momenta in units of
. The state with s=+1/2 has its spin angular momentum aligned with its orbital angular momentum, while the s=-1/2 state has the two angular momenta in opposite directions. The direction and order of magnitude of the splitting between the two
states is successfully explained by magnetic interactions with the proton, but the calculated effect is too big by a factor of 2. The relativistic Thomas precession cancels out half of the effect.
If we want to see this precession effect in real life, we should look for a system in which both v and a are large. An atom is such a system.
The Bohr model, introduced in 1913, marked the first quantitatively successful, if
conceptually muddled, description of the atomic energy levels of hydrogen. Continuing to take c=1,
the over-all scale of the energies was calculated to be proportional to mα2, where m is the mass of the
electron, and
, known as the fine structure constant,
is essentially just a unitless way of expressing the coupling constant for electrical forces.
At higher resolution, each excited energy level is found to be split into several sub-levels. The transitions among
these close-lying states are in the millimeter region of the microwave spectrum. The energy scale of this fine structure is
∼ mα4. This is down by a factor of α2 compared to the visible-light transitions,
hence the name of the constant. Uhlenbeck and Goudsmit showed in 1926 that a splitting on this order of magnitude
was to be expected due to the magnetic
interaction between the proton and the electron's magnetic moment, oriented along its spin. The effect they calculated,
however, was too big by a factor of two.
The explanation of the mysterious factor of two had in fact been implicit in a 1916 calculation by
Willem de Sitter, one of the first applications of general relativity. De Sitter
treated the earth-moon system as a gyroscope, and found the precession of its axis of rotation,
which was partly due to the curvature of spacetime and partly due to the type of rotation described earlier
in this section.
The effect on the motion of the moon was noncumulative, and was only about one meter, which was much
too small to be measured at the time. In 1927, however, Llewellyn Thomas
applied similar reasoning to the hydrogen atom, with the electron's spin vector playing the role of gyroscope.
Since gravity is negligible here, the effect has nothing to do with curvature of spacetime, and Thomas's
effect corresponds purely to the special-relativistic part of de Sitter's result. It is simply the
rotation described above, with Ω=(1/2)v2ω. Although Thomas was not the first to calculate it,
the effect is known as Thomas precession.
Since the electron's spin is
, the energy splitting is
, depending on whether the
electron's spin is in the same direction as its orbital motion, or in the opposite direction. This is less than the
atom's gross energy scale
by a factor of v2/2, which is ∼α2. The Thomas
precession cancels out half of the magnetic effect, bringing theory in agreement with experiment.
Uhlenbeck later recalled: “...when I first heard about [the Thomas precession], it seemed unbelievable that a relativistic effect could give a factor of 2 instead of something of order v/c... Even the cognoscenti of relativity theory (Einstein included!) were quite surprised.”
1.
Suppose that we don't yet know the exact form of the Lorentz transformation, but we know based on the
Michelson-Morley experiment that the speed of light is the same in all inertial frames, and we've already
determined, e.g., by arguments like those on p. 59, that there can be no length contraction
in the direction perpendicular to the motion. We construct a “light clock,”
consisting simply of two mirrors facing each other, with a light pulse bouncing back and forth
between them.
(a) Suppose this light clock is moving at a constant velocity v in the direction perpendicular
to its own optical arm, which is of length L. Use the Pythagorean theorem to prove that the clock
experiences a time dilation given by
, thereby fixing the time-time portion of the Lorentz
transformation.
(b) Why is it significant for the interpretation of special relativity that the result from part a is independent
of L?
(c) Carry out a similar calculation in the case where the clock moves with constant acceleration a
as measured in some inertial frame. Although the result depends on L, prove that in the limit of small
L, we recover the earlier constant-velocity result, with no explicit dependence on a.
(solution in the pdf version of the book)
2.
(a) On p. 67 (see figure i), we showed that the Thomas precession is
proportional to area on the velocity disk. Use a similar argument to show that the Sagnac effect (p. 59) is proportional
to the area enclosed by the loop.
(b) Verify this more directly in the special case of a circular loop.
(c) Show that a light clock of the type described in problem 1 is insensitive to rotation
with constant angular velocity.
(d) Connect these results to the commutativity and transitivity assumptions in the
Einstein clock synchronization procedure described on p. 265.
(solution in the pdf version of the book)
3. Example 10 on page 51 discusses relativistic bounds on the properties of matter, using the example of pulling a bucket out of a black hole. Derive a similar bound by considering the possibility of sending signals out of the black hole using longitudinal vibrations of a cable, as in the child's telephone made of two tin cans connected by a piece of string.
4. The Maxima program on page 62 demonstrates how to multiply matrices and find Taylor series. Apply this technique to the following problem. For successive Lorentz boosts along the same axis with rapidities η1 and η2, find the matrix representing the combined Lorentz transformation, in a Taylor series up to the first nonclassical terms in each matrix element. A mixed Taylor series in two variables can be obtained simply by nesting taylor functions. The taylor function will happily work on matrices, not just scalars. (solution in the pdf version of the book)
, and construct the uniquely determined parallelogram [ABPQ] (axiom A1).
Points P and Q determine a line (axiom O1), and this line is parallel to
(definition of the parallelogram).
To prove that this line is unique, we argue by contradiction. Suppose some other parallel m to exist.
If m crosses the infinite line BQ at some point Z, then both [ABPQ] and [ABPZ], so by A1, Q=Z, so the
and m are the same. The only other possibility is that m is parallel to BQ, but then the following
chain of parallelisms holds: PQ || AB || m || BQ. By A3, lines parallel to another
line are parallel to each other, so PQ || BQ, but this is a contradiction, since they have Q in common.