Ben Blum-Smith asked the question I’ve been pondering since my last post

I’m a little confused… if a real matrix is not symmetric, there is no guarantee of a real eigenvector (e.g. take a 2-d rotation matrix), so what did you mean “I expect that other arguments along similar lines are possible for non-symmetric real matrices”?

As it turns out, I’ve been spending quite some time repairing my own understanding of linear algebra.  What I should have said is, I expect to find similar arguments for normal matrices.

Normal is a terrible name of course.  Are other matrices weird? non-conformist?

Unfortunately, I haven’t figured out how to pin down a similarly elementary flavored proof for normal matrices.  Instead, I’m going to present an argument for another branch of the family tree: (so to speak) orthonormal matrices.

But first, I realized my last post might have jumped in a bit too deep all at once, so I’m going to try to err the other way and cover some more basic linear algebra this time as well.  Bear with me and let me define orthonormal matrices before I talk about their eigenanalysis.  I’ll give two definitions and suggest why they’re equivalent.

One standard approach is to think of the columns as vectors in $\mathbb{R}^n$.  For instance, take the vector space $\mathbb{R}^3$, aka. “yes, that convenient mathematical model for the space we live in.” We might have the $3\times 3$ matrix (courtesy wikipedia)
We say a square matrix like this is orthonormal when the column vectors are all length $1$ and are pairwise orthogonal.  More symbolically, for every column vector $x$, $|x|=1$ (equivalently $x\cdot x=1$) and for any two distinct column vectors $x$ and $y$, their dot product is $x\cdot y = 0$.  This gives a simple computation we can perform to check that the above matrix is in fact orthonormal.

But, I’m a graphics person, so let’s make a picture.  I’ll go ahead and plot the three column vectors (labeled $V1, V2, V3$) relative to the standard basis vectors $X=(1,0,0)$, $Y=(0,1,0)$ and $Z=(0,0,1)$.

As you can see, the column vectors appear to be unit length and orthogonal to each other.  If we place them in correspondence with the three standard coordinate vectors (axes) then we can think of the column vector basis as a rotated image of the standard basis.  That is, our matrix is encoding a rotation transformation.  This is how we get our second definition of an orthonormal matrix.

If we want the effect of multiplying by an orthonormal matrix to be a “rotation” (whatever that means in $n$-dimensional space) then it should probably have the following properties. (A) transforming a vector shouldn’t change its length and (B) if two vectors are orthogonal before transforming them with our matrix, then they better remain orthogonal after transforming them.  These two conditions translate to the following properties

• For all vectors $x\in\mathbb{R}^n$, $(Ax)\cdot(Ax)=x\cdot x$
• For all vectors $x,y\in\mathbb{R}^n$, $(Ax)\cdot(Ay)=x\cdot y$

In fact, we can now see that the first condition is a special case of the second.  We can express the second condition in a more algebraic sounding way with the phrase “an orthonormal transformation is a transformation that preserves the dot product.”  We can get an even nicer algebraic characterization if we do a little symbol pushing.

$(Ax)\cdot (Ay) = (Ax)^T Ay = x^T A^T A y = x^T y$

Now, the only way that the rightmost equality is gong to hold is if we can just remove $A^T A$ from the middle.  But we can do this precisely when $A^T A=I$ the identity matrix.  Aha.  That’s a great definition, now we don’t have to talk about column vectors or any sort of vectors whatsoever.  Slick!

(It turns out that you can get the equation $A^T A=I$ for orthonormal matrices from the first definition I gave too.  The trick is to realize that the $i,j$ entry of the matrix product $A^T A$ encodes the dot product between the $i$th and $j$th column vectors.)

OK, now that we’re all chummy with orthonormal matrices, I want to state the key theorem for their eigenanalysis.

Theorem: Given an $n\times n$ orthonormal matrix $A$, ($n\geq 2$) there exists a two dimensional subspace of $\mathbb{R}^n$ “fixed” by $A$.  In other words, there exist vectors $v_1,v_2$ so that for any linear combination $x=\alpha v_1 + \beta v_2$, the vector $Ax$ is also a linear combination of $v_1$ and $v_2$.

Since we defined an orthonormal transformation to preserve length, this means we can think of $A$‘s effect or action on the subspace $Span\{v_1,v_2\}$ as a rotation.  Combined with some other (simpler) theorems, we can arrive at the punch line of orthonormal eigenanalysis:

Every $n$-dimensional rotation factors into a series of independent $2$-dimensional rotations.

This is a fascinating geometric fact, but it’s usually buried behind this mess about “complex eigenvalues.”  Continuing my theme, you don’t need complex numbers, polynomials or lagrange multipliers to prove this theorem.

proof:
To start, why don’t we just leverage our previous result about symmetric matrices.  In order to do this, we have to “symmetrize” our orthonormal matrix: $S=A + A^T$.  Since $S$ is symmetric, we know that it must have some maximal eigenvalue $\lambda$ and eigenvector $x$ so that

$\lambda x = Sx = (A+A^T)x=Ax + A^T x$

This tells us that the three vectors $x$, $Ax$, and $A^T x$ are linearly dependent.  Geometrically, they all lie in the same plane.  However, we really want to focus on $A$, not $A^T$, so let’s go ahead and “rotate” this arrangement of three vectors by $A$:

$\lambda Ax = AAx + AA^T x = AAx + x$

(You can check that $A^T A=I$ if and only if $A A^T = I$, but I’ll just assume that’s already been proven here.  You may also want to note that this is the key point at which we invoke orthonormality.)  Squint, and you should see that this equation tells us that if we keep rotating $x$ by $A$, then we start becoming linearly dependent after the second rotation.  We just need to dot our ‘i’s and cross our ‘t’s now.

Let $v_1 = x$ and $v_2 = Ax$.  Then,

$A(\alpha v_1 + \beta v_2) = \alpha Ax + \beta AAx = \alpha Ax + \beta\lambda Ax - \beta x = (-\beta) v_1 + (\alpha + \beta\lambda) v_2$

There is obviously a lot more to be said about the consequences of this theorem for $2$, $3$, and $n$ dimensional rotations, but I’ve already said more than enough for a single post.