Eigen Duality and Quasi-Eigen Systems

 

For a given square matrix M an eigenvector is a vector x such that the multiplication of x by M simply re-scales x while leaving the direction of x unchanged. In other words, x is an eigenvector of M if and only if Mx = lx for some scalar l, which is called the eigenvalue corresponding to x. This condition can also be written in the equivalent form

 

 

where I is the identity matrix and 0 is the zero vector. Naturally the equation is trivially satisfied by x = 0, but the equation may also have non-trivial solutions if the determinant of the operator is zero. This requires that l be a root of the characteristic equation

 

 

which is a polynomial in l of degree equal to the order of the matrix M. If we multiply through equation (1) by (-y2/l)M2 for some arbitrary scalar y2 and square matrix M2 we get the equivalent expression

 

 

where y1 = -y2/l and M1 = M2M, but now the “eigenvalue” is represented by two numbers, y1 and y2, instead of just the single number l. Of course, only the ratio of these two numbers is significant, so each eigenvalue l is represented by a ray through the origin of the y1,y2 plane. Although equation (3) is equivalent to (2), it highlights an important and profound symmetry, because it shows that there is no difference between eigenvalues and eigenvectors. To make this explicit, consider the simple case where M1 and M2 are square matrices of order 2, and x is a column vector with two components x1 and x2. Letting m1ij and m2ij denote the elements of M1 and M2 respectively, we can multiply out the left side of equation (3) explicitly and show that it can be written in either of two forms:

 

 

When written in the first form, y1 and y2 are the numerator and denominator of the eigenvalue, and x1 and x2 are the components of the eigenvector. But when written in the second form, the roles are reversed, i.e., y1 and y2 are the components of the eigenvector, and x1 and x2 are the numerator and denominator of the eigenvalue. (Note that only the direction of an eigenvector is constrained by the system of equations, so only the ratio of its components is significant, just as only the ratio of the numerator and denominator of the eigenvalue is significant.) Of course, the coefficient matrices are decomposed differently, in one case being of the form m1ij and m2ij, and in the other case being of the form mji1 and mji2 respectively, but this simply reflects the fact that x and y lie in different component spaces. There is no justification for giving either of them priority over the other.

 

The duality between eigenvalues and eigenvectors applies for systems of any order. It is simply disguised in most applications because we artificially break the symmetry by forcing one of the matrices to be the identity and forcing the scalar multiplier of the other matrix to be unity. This not only obscures the natural duality between eigenvectors and eigenvalues, it also entails an unwarranted specialization of the general form by not allowing additive partitions of the coefficient matrix. In addition, writing the equation in the symmetrical form suggests two interesting mathematical generalizations, and also has some interesting relevance to the interpretation of physical theories, especially quantum mechanics. We discuss each of these topics below.

 

First, to clarify the symmetry between eigenvalues and eigenvectors, we will define a notation and index convention for matrices and vectors. We’re accustomed to dealing with two different kinds of vectors, which we may call column and row vectors, but these are just two of infinitely many possible kinds of vectors, one for each of the possible dimensions of an array. Let us combine the two original matrices M1 and M2 into a single three-dimensional matrix, denoted by Mmns where m,n,s are indices ranging from 1 to 2, and let Xm11 denote the vector with elements x111 = x1 and x211 = x2 (the components of the eigenvector x), and let Y1n1 denote the vector with elements y111 = y1 and y121 = y2. In general we indicate multiplications using the summation convention over repeated indices in a single term. Also, if corresponding indices of two arguments are both explicit numerals, or both dummy indices, we set that index of the product to 1, whereas if the index is a numeral or dummy (repeated) in one argument but a variable in the other, we set that index of the argument to the variable. To illustrate, the product P given by ordinary matrix multiplication of the original two matrices could be expressed as

 

 

With this notation, our eigenvalue system is written in the form

 

 

Since X and Y are orthogonal, they commute. Thus this eigenvalue problem, when expressed in its natural symmetrical form, simply consists of finding two vectors that, when used as relative weights for summing and contracting a three-dimensional array in two of its dimensions, yields a zero vector in the remaining dimension. Of course, we could also multiply by a vector in the third dimension to collapse the matrix down to a scalar 0, which would result in a continuous locus of eigen-solutions. For any choice of one of those three vectors, the remaining two would have two discrete solutions (up to the arbitrary scale factors).

 

It might seem as if the duality between eigenvalues and eigenvectors exists only for matrices of order 2, because there are only two terms in the traditional eigenvalue equation (3), with coefficients representing the numerator and denominator of the traditional scalar eigenvalue. However, the symmetrical form immediately leads to a natural generalization of (3), such that we can partition the operator into more than just two parts. For example, by partitioning the coefficient matrix into three parts (instead of just two), we have a system described by the equation

 

 

where A, B, C are square matrices of order N, and a, b, g are scalars. As before, this equation can have non-trivial solutions only if the determinant of the overall operator vanishes, i.e.,

 

 

This represents a polynomial of degree N in the three scalars a,b,g. Again, only the ratios of these components are significant, so the “eigenvalues” of the system consist of rays through the origin of a three-dimensional space, just as (if N = 3) the eigenvectors are rays through the origin of a three-dimensional space. If we normalize the projective space of the eigenvalues (which we can do in various ways, such as by dividing through the characteristic equation by the Nth power of one of the eigen-components, or by stipulating that a + b + g = 1), we get a quadratic equation in two variables, so the eigenvalues now consist of a conic locus of points on the normalized surface. An alternative is to normalize the length of the eigenvalue to unity, i.e., to stipulate that a2 + b2 + g2 = 1, in which case the eigenvalues consist of a continuous locus of points on the unit sphere in three dimensions – as do the normalized eigenvectors.

 

By partitioning the coefficient matrix into N parts, we get a system with N-dimensional eigenvalues and N-dimensional eigenvectors, and the coefficient matrices are N x N square matrices regardless of whether we transpose the eigenvalues and eigenvectors. However, the duality between eigenvalues and eigenvectors is not limited to such cases. In general we can have different numbers of dimensions for those entities. For example, consider a system described by

 

 

where the two coefficient matrices A and B are of order three. In this case the explicit system equations can be written as

 

 

Thus the duality between x and y still applies, in the sense that either of those “eigenrays” can equally well be regarded as the eigenvalue or the eigenvector, with the understanding that the coefficient matrices need not be square. Of course, if the coefficient matrices are not square, the eigen condition can no longer be expressed as the vanishing of the determinant of the sum of the coefficient matrices, because the definition of the determinant applies only to square matrices. Nevertheless, we still have a perfectly well-defined eigen condition, with the understanding that it consists, in general, of a set of simultaneous equations rather than just a single equation. In the above example, the eigen condition on y is expressed by the vanishing of the determinant of the sum of the two square coefficient matrices, which implies just a single polynomial in the yj parameters, whereas the eigen condition on x consists of any two of the three simultaneous conditions

 

 

As a further generalization, we need not be limited to systems with just two “eigenrays” (i.e., systems with eigenvalues and eigenvectors). Using the index notation discussed previously, we can define overall coefficient matrices with any number of dimensions, and then contract it with two or more eigenrays. For example, we can consider systems such as

 

 

where the indices a,b,g,d need not all vary over the same ranges. Each of the vectors X, Y and Z represents a class of eigenrays for the system. This shows that the artificial distinction between eigenvalues and eigenvectors is completely meaningless. A good illustration of this is given by the simplest expression of this form, namely

 

 

where the indices range from 1 to 2. This represents the single constraint

 

 

In terms of the ratios f = x1/x2 and q = y1/y2, this can be written as

 

 

Thus we can regard the ratio (x1/x2) as the eigenvalue corresponding to the eigenvector [y1,y2], and conversely we can regard the ratio (y1/y2) as the eigenvalue corresponding to the eigenvector [x1,x2], and these ratios are related by a linear fractional (Mobius) transformation. Of course, this simple system doesn’t restrict the set of possible eigenvalues or eigenvectors, but it does establish a one-to-one holomorphic mapping between them. Even this simple system has applications in both relativity and quantum mechanics, since the Mobius transformations can encode Lorentz transformations and rotations, and with stereographic projection onto the Riemann sphere they can be used to represent the state vectors of a simple physical system with two basis states.

 

So far we’ve considered only purely mathematical generalizations and re-interpretations of eigenvalue problems, but another kind of generalization is suggested by ideas from physics. The traditional asymmetrical form Mx = lx used in the representation of a “measurement” performed by one system on another in the context of quantum mechanics gives priority to one of the two interacting systems, treating one as the observer and the other as the observed. But surely both systems are “making an observation” of each other (i.e., interacting with each other), so just as there is an operator M1 representing the observation performed by one system, there must be a complementary operator M2 representing the reciprocal “observation” performed by the other system. This strongly suggests that the symmetrical form, with some non-trivial partition of the coefficient matrix, is more likely than the asymmetrical form to give a suitable representation of physical phenomena. According to this view, the eigenvalue arising from the application of an observable operator is properly seen as part of the state vector of the “observing system”, just as the corresponding state vector to which the “observed system” is projected can be seen as the “value” arising from the measurement which “observed system” has performed on the “observing system”.

 

However, it might be argued the symmetrical form presents a problem when we try to interpret the eigenvalues, which supposedly signify the results of the observations. As emphasized above, the two individual eigenvalues y1 and y2 in the symmetrical form are indeterminate, because only their ratio y2/y1 = l is determined by the equation. This conflicts with the fact that physical observations yield definite absolute values, not just proportionalities. Observations are not arbitrarily re-scaleable – at least not on the smallest scales. There is a definite absolute scale, associated with the existence of a minimal action greater than zero, i.e., a finite quantum of action, characterized by Planck’s constant h.

 

To represent this situation accurately, perhaps we ought to replace the 0 in equation (3) with a vector of magnitude h, which we will denote as h. Having done this, the equation takes the inhomogeneous form

 

 

We call this a pseudo-eigen system. Now, assuming the matrix inside the parentheses is not singular, we could simply multiply both sides by the inverse of that matrix to give the unique solution for x, but this corresponds to the trivial x = 0 solution in the traditional formulation, except this solution will be on the order of h rather than 0. Hence this still represents a kind of trivial null solution, whereas (just as in the traditional case) the equation also possesses non-trivial solutions, provided the combined matrix on the left side satisfies a certain condition. Specifically, there will be non-trivial solutions x provided the determinant of the combined matrix is on the order of h. Thus we require

 

 

To give a simple illustration, consider the two matrices

 

 

The condition for non-trivial solution is

 

 

If the right hand side were zero, we could divide through by b2, and this would just be the characteristic equation with two roots, representing the two eigenvalues of the system. But since the right hand side is not zero, this equation gives a surface, which we will refer to as an eigensurface. In this simple example the surface is a conic locus as illustrated in the figure below.

 

 

The solution x is of the form

 

 

For any point of the (a,b) plane (other than the origin) we have a solution, but since h is extremely small, the solution components will be correspondingly small unless the determinant is on the same order of magnitude as h. The eigensurface represents the locus of points on the (a,b) plane where the determinant equals h, so the leading factor in the expression for x is unity. However, the components of the solution must still be extremely small because a and b are extremely small on the eigensurface, except where the eigensurface approaches the original eigenvalues. On those asymptotes (and only there) we can make a and b arbitrarily large, but of course they must maintain essentially a constant ratio, which approaches the traditional eigenvalue the further we are from the origin. In that case the components of x approach a constant ratio to each other, i.e., they approach the traditional eigenvector.

 

The eigensurface has several interesting features. It provides a continuous path from one eigenvalue (asymptote) to the other, but it also consists of two disjoint branches. Which pairs of asymptotes are connected depends on the particular conic that describes the locus. The hyperbolic nature of the locus is a consequence (in ordinary quantum mechanics) of the stipulation that the eigenvalues of observable operators are purely real. If h is zero (as it is traditionally taken to be), the discriminant of the characteristic equation must be positive, and hence the conic when h is non-zero is a hyperbola. However, since h is non-zero, it is no longer necessary to require a positive discriminant (so observable operators need not be Hermitian), since we can get real eigenvalues even for characteristic equations with negative discriminants. Thus we can allow for elliptical eigensurfaces as depicted below.

 

 

Of course, the solution vectors x for such systems are necessarily extremely close to null, since there are no asymptotes extending far from the origin. If h is sufficiently small, we might never notice the existence of such systems. The potential for parabolic systems is also intriguing, as they would be single-valued but allowing for solutions vectors of arbitrary size, and those vectors would not have components in constant proportion, since the parabola is not asymptotic to any fixed rays from the origin.

 

In general the degree of the eigensurface equals the order of the system. For two square matrices A and B of order n, we can express the determinant |aA+bB| in terms of the determinants |ajA+bjB| with j = 1, 2, …, n+1, where the (aj,bj) are n+1 independent “basis points” of the a,b plane. The relation can be written in the form

 

 

Equation (8) was written using the basis points (1,0), (0,1), and (1,1), but if we happen to know the n ordinary eigenvalues (mj,nj) such that |mjA+njB| = 0 we could use them as n of the required n+1 basis points. (Of course this can be done only if there are n distinct eigenvalues.) We can then freely select just one more basis point, in terms of which to express the eigensurface equation. In doing so we must take care to preserve the scale factor, since ultimately we are setting a determinant not to 0 but to h. The characteristic equation giving the ordinary eigenvalues doesn’t contain the scale information, since only the ratios are important, so we can’t simply identify the characteristic polynomial with the determinant. Writing the ordinary characteristic equation in the symmetrical form

 

 

we have

 

for arbitrary constants k1 and k2 (not both equal to zero), and hence we can write the equation of the eigensurface as

 

 

Naturally we can combine the “quantized eigen problem” with the generalization to multiple partitions discussed previously to represent interactions involving more than just two systems. It can certainly be argued that the conventional focus on purely binary interactions is not theoretically justified. If indeed there are “measurement” interactions involving (say) three systems, the relevant equation might be expressible in the form

 

 

for observables A, B, and C. If the matrices are of order two, the eigensurface would then be given by

 

 

This is the equation of a quadric surface in the three-dimensional space of a,b,g.

 

The key distinctions between systems described by an equation such as (9) and those described by an equation such as (5) is that the former involves an absolute scale, so it’s important to correctly define the absolute magnitudes (not just the ratios) of the coefficients of the polynomials defining the characteristic solutions. Thus its useful to understand the form of the expressions for those coefficients. Consider the three-partition system of order n, for which we have the determinant

 

 

where mi,j are the components of M, and s represents a permutation [s1,s2,…,sn] of the indices 1 to n, and S is the set of all such permutations, and s(s) is either +1 or -1 accordingly as the permutation s is even or odd. In this example we make the substitutions

 

 

where ai,j, bi,j, and ci,j are the components of A, B, C respectively, and we get a homogeneous polynomial of degree n in the parameters a,b,g. Thus for some column vector R we have

 

 

where

 

The components of R can be found by choosing 6 independent basis points (aj,bj,gj) with j = 1, 2, …, 6, and solving the system of equations

 

 

where

 

 

Thus letting H(i=J) denote the matrix H with the ith column replaced with J, we can write the components of R explicitly as

 

 

We also have R = H-1J, so we can express the desired determinant as a linear combination of the “basis determinants” (i.e., the components of J) by substituting into (10) to give

 

 

where K = UH-1. Hence we have KH = U, and letting H(i=U) denote the matrix H with the ith row replaced with U, the components of K can be written explicitly as

 

 

Both of the novel aspects of eigen problems discussed above have potentially interesting implications for physical theory, particularly quantum mechanics. The duality between eigenvalues and eigenvectors, i.e., the fact that they may exchange roles, must be reconciled with the very distinct roles that these entities play in quantum mechanics. The eigenvector is associated with an observable state of a system, and its components are generally complex, whereas the eigenvalue is taken to be the purely real scalar result of a measurement. As mentioned above, the ordinary view in quantum mechanics is based on a definite (though arguably unjustified and poorly defined) distinction between observing and observed physical systems. Likewise the concept of quasi-eigen problems, according to which the determinant of the system is set not to 0 but to the (small) finite value h, might provide an alternative representation of quantum mechanics. It would be interesting to determine the correspondence between these quasi-eigen systems and the usual Poisson brackets used in quantum mechanics.

 

Return to MathPages Main Menu