I’m a firm believer that if you can’t understand some concept without resorting to complex numbers and invoking the fundamental theorem of algebra, then you don’t have a deep understanding of the concept. This is particularly a problem for teaching undergraduate mathematics (particularly to scientists and engineers), since some of the most crucial techniques—eigenanalysis and Fourier transforms—make heavy use of complex numbers. The result of the “complex” arguments presented is that most students have a very difficult time with these subjects. Ultimately, many treat them as magical tools that are hardly worth the time invested to understand.
Such a perspective is of course hogwash.
I know for a fact that a much more geometric (and in my opinion illuminating) interpretation of eigenanalysis is possible. However, I had never seen a clear exposition from first principles that one could reasonably expect to present in a beginning linear algebra course. Every explanation of the geometric perspective that I’ve seen has been post-hoc or relied on machinery like Lagrange multipliers—an unwelcome diversion.
I would not be surprised if someone else has come up with this proof before. However, I was not able to quickly find a reference for it. It makes no appeals to (a) the characteristic polynomial, (b) matrix polynomials (c) complex numbers or (d) Lagrange multipliers. I have only treated the real-symmetric case, although I expect that other arguments along similar lines are possible for non-symmetric real matrices.
Theorem: Let be an real symmetric matrix. Then there exists an eigenvector with eigenvalue . In other words, .
We will hinge this discussion on the maximization of a function , constructed from the matrix . Let be a real-valued function defined on input vectors constrained to lie on the unit sphere. In other words, we impose the constraint . To begin with, we note that the function is continuous and defined everywhere on the unit sphere. Since the sphere is a bounded domain, must attain a maximum for some pair of unit vectors .
Now, I claim that if is a maximal pair, then must lie collinear with . To see why, rewrite the dot product defining as . Since , we may independently maximize the other two terms and . Once a maximizing is chosen, is optimized to be by choosing collinear with . This gives us the identity .
Now, since our matrix is symmetric, must also be a maximal pair, since . This gives us the identity . In general, symmetry usually makes it possible to swap and .
Now, before we get to the finale, I’d like to take a brief break to reflect on one of our earlier observations. Recall that . In our previous argument we concluded that both and better be true of any maximal pair. Although we didn’t note it at the time, this means that for our maximal pair . By symmetry, we can then observe that . That is, and are equally stretched by .
Finally, we are ready to bring things to a head. I claim that is an eigenvector of with eigenvalue .
Of course this proof, while rigorous, hardly gets at the appropriate intuition. Either before or after the preceding proof a teacher would be remiss not to mention that the function has elliptical level sets (when the matrix is positive definite), and then to give a visual demonstration of what these maxima of the function really are. They identify the axes of the ellipse—the eigenvectors—while the eigenvalues reflect the radii associated with each axis.
If someone is already familiar with the method of Lagrange multipliers, then the preceding argument can be mostly elided. However, working around them avoids a major digression into multivariable calculus.