I’m a firm believer that if you can’t understand some concept without resorting to complex numbers and invoking the fundamental theorem of algebra, then you don’t have a deep understanding of the concept.  This is particularly a problem for teaching undergraduate mathematics (particularly to scientists and engineers), since some of the most crucial techniques—eigenanalysis and Fourier transforms—make heavy use of complex numbers.  The result of the “complex” arguments presented is that most students have a very difficult time with these subjects.  Ultimately, many treat them as magical tools that are hardly worth the time invested to understand.

Such a perspective is of course hogwash.

I know for a fact that a much more geometric (and in my opinion illuminating) interpretation of eigenanalysis is possible.  However, I had never seen a clear exposition from first principles that one could reasonably expect to present in a beginning linear algebra course.  Every explanation of the geometric perspective that I’ve seen has been post-hoc or relied on machinery like Lagrange multipliers—an unwelcome diversion.

I would not be surprised if someone else has come up with this proof before.  However, I was not able to quickly find a reference for it.  It makes no appeals to (a) the characteristic polynomial, (b) matrix polynomials (c) complex numbers or (d) Lagrange multipliers.  I have only treated the real-symmetric case, although I expect that other arguments along similar lines are possible for non-symmetric real matrices.

Theorem: Let A be an n\times n real symmetric matrix.  Then there exists an eigenvector v with eigenvalue \lambda.  In other words, Av = \lambda v.

the proof:

We will hinge this discussion on the maximization of a function F_A(x,y), constructed from the matrix A.  Let F_A(x,y)=x\cdot Ay be a real-valued function defined on input vectors x, y\in\mathbb{R}^n constrained to lie on the unit sphere.  In other words, we impose the constraint |x|=|y|=1.  To begin with, we note that the function F_A is continuous and defined everywhere on the unit sphere.  Since the sphere is a bounded domain, F_A must attain a maximum for some pair of unit vectors (x,y).

Now, I claim that if (x,y) is a maximal pair, then x must lie collinear with Ay.  To see why, rewrite the dot product defining F_A as x\cdot Ay = |x||Ay|\cos\theta. Since |x|=1, we may independently maximize the other two terms |Ay| and \cos\theta.  Once a y maximizing |Ay| is chosen, \cos\theta is optimized to be 1 by choosing x collinear with |Ay|.  This gives us the identity x = Ay / |Ay|.

Now, since our matrix A is symmetric, (y,x) must also be a maximal pair, since F_A(x,y) = x\cdot Ay = y\cdot Ax = F_A(y,x).  This gives us the identity y = Ax / |Ax|.  In general, symmetry usually makes it possible to swap x and y.

Now, before we get to the finale, I’d like to take a brief break to reflect on one of our earlier observations.  Recall that F_A(x,y)=|x||Ay|\cos\theta.  In our previous argument we concluded that both |x|=1 and \cos\theta=1 better be true of any maximal pair.  Although we didn’t note it at the time, this means that F_A(x,y)=|Ay| for our maximal pair (x,y).  By symmetry, we can then observe that |Ay| = F_A(x,y) = F_A(y,x) = |Ax|.  That is, x and y are equally stretched by A.

Finally, we are ready to bring things to a head.  I claim that v=x+y is an eigenvector of A with eigenvalue \lambda = |Ax| = |Ay|.

Av = A(x+y) = Ax + Ay = |Ax|y + |Ay|x = \lambda(x+y) = \lambda v

The intuition:

Of course this proof, while rigorous, hardly gets at the appropriate intuition.  Either before or after the preceding proof a teacher would be remiss not to mention that the function Q_A(x) = F_A(x,x) has elliptical level sets (when the matrix A is positive definite), and then to give a visual demonstration of what these maxima of the function really are.  They identify the axes of the ellipse—the eigenvectors—while the eigenvalues reflect the radii associated with each axis.

If someone is already familiar with the method of Lagrange multipliers, then the preceding argument can be mostly elided.  However, working around them avoids a major digression into multivariable calculus.

Advertisements