I’m a firm believer that if you can’t understand some concept without resorting to complex numbers and invoking the fundamental theorem of algebra, then you don’t have a deep understanding of the concept. This is particularly a problem for teaching undergraduate mathematics (particularly to scientists and engineers), since some of the most crucial techniques—eigenanalysis and Fourier transforms—make heavy use of complex numbers. The result of the “complex” arguments presented is that most students have a very difficult time with these subjects. Ultimately, many treat them as magical tools that are hardly worth the time invested to understand.
Such a perspective is of course hogwash.
I know for a fact that a much more geometric (and in my opinion illuminating) interpretation of eigenanalysis is possible. However, I had never seen a clear exposition from first principles that one could reasonably expect to present in a beginning linear algebra course. Every explanation of the geometric perspective that I’ve seen has been post-hoc or relied on machinery like Lagrange multipliers—an unwelcome diversion.
I would not be surprised if someone else has come up with this proof before. However, I was not able to quickly find a reference for it. It makes no appeals to (a) the characteristic polynomial, (b) matrix polynomials (c) complex numbers or (d) Lagrange multipliers. I have only treated the real-symmetric case, although I expect that other arguments along similar lines are possible for non-symmetric real matrices.
Theorem: Let
be an
real symmetric matrix. Then there exists an eigenvector
with eigenvalue
. In other words,
.
the proof:
We will hinge this discussion on the maximization of a function , constructed from the matrix
. Let
be a real-valued function defined on input vectors
constrained to lie on the unit sphere. In other words, we impose the constraint
. To begin with, we note that the function
is continuous and defined everywhere on the unit sphere. Since the sphere is a bounded domain,
must attain a maximum for some pair of unit vectors
.
Now, I claim that if is a maximal pair, then
must lie collinear with
. To see why, rewrite the dot product defining
as
. Since
, we may independently maximize the other two terms
and
. Once a
maximizing
is chosen,
is optimized to be
by choosing
collinear with
. This gives us the identity
.
Now, since our matrix is symmetric,
must also be a maximal pair, since
. This gives us the identity
. In general, symmetry usually makes it possible to swap
and
.
Now, before we get to the finale, I’d like to take a brief break to reflect on one of our earlier observations. Recall that . In our previous argument we concluded that both
and
better be true of any maximal pair. Although we didn’t note it at the time, this means that
for our maximal pair
. By symmetry, we can then observe that
. That is,
and
are equally stretched by
.
Finally, we are ready to bring things to a head. I claim that is an eigenvector of
with eigenvalue
.
The intuition:
Of course this proof, while rigorous, hardly gets at the appropriate intuition. Either before or after the preceding proof a teacher would be remiss not to mention that the function has elliptical level sets (when the matrix
is positive definite), and then to give a visual demonstration of what these maxima of the function really are. They identify the axes of the ellipse—the eigenvectors—while the eigenvalues reflect the radii associated with each axis.
If someone is already familiar with the method of Lagrange multipliers, then the preceding argument can be mostly elided. However, working around them avoids a major digression into multivariable calculus.
I’m a little confused… if a real matrix is not symmetric, there is no guarantee of a real eigenvector (e.g. take a 2-d rotation matrix), so what did you mean “I expect that other arguments along similar lines are possible for non-symmetric real matrices”?
Btw, I really like this proof and my favorite thing about it is that it gave me a new point of view on the connection between a real symmetric matrix as a bilinear form and as a linear transformation. (How the vectors that maximize its value as a bilinear form are related to its eigenvectors…)
Hi Ben, I put up a new post with an answer a few days ago.