Invariance, Covariance, and Contravariance

To get an intuitive idea of the difference between invariance, covariance, and contravariance, suppose we have an aquarium tank filled with water, and we define rectangular Cartesian coordinates [x,y,z] to identify each point in the tank. We could now express the temperature of the water at each point by the function T(x,y,z). This is just a single number associated with each point in the tank, representing the temperature (let's say in degrees C) at that point.

Now suppose we change our minds and decide to use polar coordinates [r,θ,ϕ] instead of rectangular coordinates [x,y,z]. These new coordinates are known functions of the original coordinates r(x,y,z), θ(x,y,z), and ϕ(x,y,z). Clearly the value of T is invariant with respect to changes in the coordinate system. Thus, the temperature T(r,θ,ϕ) in polar coordinates is related to the temperature T(x,y,z) in rectangular coordinates by

where r, θ, and ϕ are each functions of x, y, and z. This means the value of T at a given point is the same, regardless of the coordinate system we choose.

However, suppose we had determined the gradient G(x,y,z) of the temperature at each point. This is a vector at each point [x,y,z] with the components given by the partial derivatives

Now suppose we want to convert the gradient to polar coordinates. If the gradient was invariant with respect to coordinates changes we would expect the components to be unchanged at any given point. That is, we would expect the components of the gradient to be given by G_r = G_x, etc. However, that's clearly not the case, because the components of the temperature gradient with respect to polar coordinates are

which represent the derivatives of temperature with respect to the polar coordinates, not with respect to the Cartesian coordinates. Thus, even though the value of the temperature itself at any point is independent of the choice of coordinate systems, the components of the gradient of the temperature at a given point clearly depend on the coordinate system we are using.

Fortunately, it's still possible to express the components of G_(r_qf₎ in terms of the components of G_(xyz), but we need to take into account the relation between the polar coordinates and the Cartesian coordinates. For example, the conversion for G_r looks like this

where r, θ, and ϕ are each functions of x,y,z (and the derivatives are partials). Entities like the temperature gradient whose components transform according to this kind of rule are called covariant.

Finally, suppose the water in the tank is not perfectly motionless, but has some velocity at each point. Like the gradient, this is a vector at each point. The components of the velocity vector are

where t stands for time. Its worthwhile to compare these components with those of the temperature gradient considered previously. With the gradient we had G_x = ∂T/∂x whereas with the velocity we have V_x = ∂x/∂t. So the gradient consists of partial derivatives of some "other" variable (such as the temperature T) with respect to the coordinates, whereas the velocity consists of partial derivatives of the coordinates with respect to some "other" variable (such as the time t).

Like the gradient, the velocity vector is not invariant under coordinate transformations, and again the conversion depends on the relation between the two sets of coordinates. However, the conversion has a different form. For example, the first component of the velocity vector in polar coordinates is given by

and similarly for the other two components. Entities that transform from one coordinate system to another according to this kind of rule are called contravariant. Notice that this differs from the transformation for the gradient, because in this case we multiply the terms on the right side by ∂r/∂x, ∂r/∂y, and ∂r/∂z respectively, whereas in the gradient transformation we multiplied the terms on the right hand side by ∂x/∂r, ∂y/∂r, and ∂z/∂r respectively. This discussion has focused on scalars and vectors, but the same ideas apply to tensors of any order. We can also have "mixed" tensors, which are covariant with respect to some of their indices and contravariant with respect to others.

At this point people often wonder how we can talk about a vector being contravariant or covariant when the direction and magnitude of a vector (which are its defining properties) are actually invariant with respect to coordinate changes. This question points out a problem with the terminology. People commonly talk about contravariant and covariant vectors and tensors, when they really mean contravariant and covariant components. A given velocity vector (for example) has whatever direction and magnitude it has, independent of the coordinate system we use to express it. So it's true that the velocity vector itself doesn't change when we switch coordinate systems. However, the components of the vector change.

For example, suppose we have a velocity vector in the plane with components [1,1] relative to a particular xy coordinate system. This vector has a magnitude of √2 and is pointing at 45 degrees up from the x axis. However, if we rotate the coordinate system about the origin so that the x-axis lines up with the vector, it now has coordinates [√2,0]. Notice that it's magnitude is still √2 because the magnitude is invariant, but the components of the vector are different. We didn't change the direction of the vector, we changed the orientation of the coordinate system, so the components of the vector had to change accordingly.

Now, if we accept that the components of a tensor have to change when we change coordinate systems, we might still wonder why they change differently depending on whether they are contravariant or covariant. The distinction between these two kinds of components is a bit subtle. Essentially, the contravariant components of a vector are directed parallel to the coordinate axes, whereas the covariant components of a vector are directed normal (perpendicular) to constant coordinate surfaces. Of course, in the case of orthogonal Cartesian coordinates the axes are, by definition, normal to constant coordinate surfaces, so there is no distinction between contravariant and covariant components. This is why we don’t encounter the distinction until we begin to consider transformations between more general kinds of coordinate systems.

Return to MathPages Main Menu