I'll make here a very informal attempt at explaining what eigenvectors of smallest and greatest eigenvalues have to do with minimizing $v^{T}Av$. It will be neither rigorous nor will it cover all cases, but certainly I hope it will enlighten you.

### Decompositions

Matrices can often be re-expressed as a product of two, three or more other matrices, usually chosen to have "nice" properties. This re-expression is called a *decomposition*, and the act of re-expressing is called *decomposing*. There are many decompositions, and they are classified by the kind of "nice" properties that the resultant product matrices have. Some of these decompositions always exist, while some are applicable only to a few select types of matrices, and some will produce results with "nicer" properties if their input is nicer too.

### Eigendecomposition

We'll here be interested in one particular decomposition. It's called the *eigendecomposition*, or alternatively *spectral decomposition*. It takes a matrix $A$ and decomposes it into a matrix product $A = Q \Lambda Q^{-1}$. Both $Q$ and $\Lambda$ have the following "nice" properties:

- $Q$'s columns contain the eigenvectors of $A$.
- $\Lambda$ is a diagonal matrix containing, for each eigenvector, the corresponding eigenvalues of $A$.

### Nice properties of Eigendecomposition

As mentioned previously, decompositions can have nicer properties if they have nice input. As it happens, we do have (very) nice input; $A$ is symmetric, and real. Under those conditions, the result of eigendecomposition has the following extra "nice" properties:

- $\Lambda$'s entries are all real numbers.
- Therefore the eigenvalues of $A$ are real numbers.

- $Q$ is
*orthogonal*.
- Therefore, $Q$'s columns, the eigenvectors, are all unit vectors that are orthogonal to each other. This makes the $Q^{-1}$ matrix easy to compute once you have $Q$: It's just $Q^{-1}=Q^{T}$.

### Multiplying a vector by a matrix

Let's change tracks for a moment. Suppose now you have an $n$-row matrix matrix $M$. When you carry out a matrix-vector multiplication $v' = Mv$, you compute $n$ dot-products: $v \cdot M_\textrm{row1}$, $v \cdot M_\textrm{row2}$, ..., $v \cdot M_\textrm{row$n$}$, which become a new $n$-element vector $v'$.

$$M = \left( \begin{array}{c} M_{\textrm{row$_1$}} \\ M_{\textrm{row$_2$}} \\
\cdots \\ M_{\textrm{row$_n$}} \end{array} \right)$$
$$v' = Mv = \left( \begin{array}{c} M_{\textrm{row$_1$}} \\ M_{\textrm{row$_2$}} \\
\cdots \\ M_{\textrm{row$_n$}} \end{array} \right) v = \left( \begin{array}{c} M_{\textrm{row$_1$}} \cdot v \\ M_{\textrm{row$_2$}} \cdot v \\
\cdots \\ M_{\textrm{row$_n$}} \cdot v \end{array} \right)$$

As you know, $a \cdot b$, where $b$ is a unit vector, calculates the *projection* of $a$ onto $b$; In other words, how much do they overlap or shadow each other.

Therefore, by doing that matrix-vector multiplication $Mv$, you've re-expressed $v$ in terms of its $n$ projections onto the $n$ row vectors of $M$. You can think of that as a re-encoding/re-expression/compression of $v$ of sorts. The word "compression" is especially apt, since some information may be lost; This will be the case if $M$ isn't square or its rows aren't orthogonal to each other.

### Multiplying a vector by an orthogonal matrix

But what if $M$ *is* orthogonal? Then all its rows are orthogonal to each other, and if so, *it's actually possible to not lose any information at all when doing $v' = Mv$, and it's possible to recover the original vector $v$ from $v'$ and $M$!*

Indeed, take an $n \times n$ orthogonal matrix $M$ and a $n$-element vector, and compute $v' = Mv$. You'll compute the $n$ projections of $v$ onto the $n$ rows of $M$, and because these rows of $M$ are fully independent of each other, the projections of $v$ (stored as elements of $v'$) contain no redundant information between themselves, and therefore nothing had to be pushed out or damaged to make space.

And because you've losslessly encoded the vector $v$ as a vector $v'$ of projections onto the rows of $M$, it's possible to *recreate* $v$, by doing the reverse: Multiplying the rows of $M$ by the projections in $v'$, and summing them up!

To do that, we must transpose $M$, since we're left-multiplying $v'$ by $M$. Whereas we previously had the rows of $M$ where we conveniently wanted them (as rows) to do the encoding, now we must have them as the columns in order to do the decoding of $v'$ into $v$. Whence we get

$$v = M^T v'$$
$$v = M^T (Mv)$$
$$v = (M^T M)v$$
$$v = v$$

As an aside, this is the reason why orthogonal matrices $Q$ have the property

$$I = QQ^T = Q^T Q$$

And hence why

$$Q^{-1}=Q^T$$

. You'll recall I pointed out this property earlier on.

### Recap

Given an orthogonal matrix $M$ and vector $v$:

Encoding $v \to v'$ as projections onto *rows* of $M$ ($v' = Mv$):
$$M = \left( \begin{array}{c} M_{\textrm{row$_1$}} \\ M_{\textrm{row$_2$}} \\
\cdots \\ M_{\textrm{row$_n$}} \end{array} \right)$$
$$v' = Mv = \left( \begin{array}{c} M_{\textrm{row$_1$}} \\ M_{\textrm{row$_2$}} \\
\cdots \\ M_{\textrm{row$_n$}} \end{array} \right) v = \left( \begin{array}{c} M_{\textrm{row$_1$}} \cdot v \\ M_{\textrm{row$_2$}} \cdot v \\
\cdots \\ M_{\textrm{row$_n$}} \cdot v \end{array} \right)$$

Decoding $v' \to v$ by multiplying the *rows* of $M$ by the projections onto them, and summing up ($v = M^{T}v'$).
$$M^T = \left( \begin{array}{c} M_{\textrm{row$_1$}}^T & M_{\textrm{row$_2$}}^T &
\cdots & M_{\textrm{row$_n$}}^T \end{array} \right)$$
$$v = M^{T}v' = \left( \begin{array}{c} M_{\textrm{row$_1$}}^T & M_{\textrm{row$_2$}}^T &
\cdots & M_{\textrm{row$_n$}}^T \end{array} \right) v'$$
$$= \left( \begin{array}{c} M_{\textrm{row$_1$}}^T & M_{\textrm{row$_2$}}^T &
\cdots & M_{\textrm{row$_n$}}^T \end{array} \right) \left( \begin{array}{c} M_{\textrm{row$_1$}} \cdot v \\ M_{\textrm{row$_2$}} \cdot v \\
\cdots \\ M_{\textrm{row$_n$}} \cdot v \end{array} \right)$$
$$= (M_{\textrm{row$_1$}} \cdot v)M_{\textrm{row$_1$}}^T + (M_{\textrm{row$_2$}} \cdot v)M_{\textrm{row$_2$}}^T +
\cdots + (M_{\textrm{row$_n$}} \cdot v)M_{\textrm{row$_n$}}^T$$
$$=v$$

### Multiplying a vector by an eigendecomposed matrix

We now get to the crux of my argument. Suppose now we don't treat that matrix $A$ from so long ago as a black box, but instead look under the hood, at its eigendecomposition $A = Q\Lambda Q^{-1}$, or in this particular case $A = Q\Lambda Q^{T}$. See those orthogonal $Q$'s sandwiching a diagonal matrix? Well, $Q$ has the eigenvectors in its columns, so $Q^T$ will have them in its rows. We've seen how a $Q$-$Q^T$ or $Q^T$-$Q$ sandwhich essentially encodes/decodes, and we're here encoding/decoding over $A$'s eigenvectors. The only twist here is this extra $\Lambda$ matrix!

What it effectively does is that after encoding but before decoding, it scales each of the components of the encoded vector independently, and then hands off to the decoding matrix.

To maximize $v^T A v = v^T Q \Lambda Q^T v$, our goal is therefore *to choose $v$ such that when it is encoded, all its ***energy** is in the component that is then scaled **by the largest eigenvalue in $\Lambda$!** And how do we achieve that? **By choosing it to be the eigenvector corresponding to that eigenvalue!** This maximizes the value of the dot-product for that component in the encoding step, and this maximally large value gets scaled by the largest eigenvalue we have access to. If we fail to align 100% of the energy of $v$ with the eigenvector of greatest eigenvalue, then when $v$ will be encoded, some of that energy will bleed out into other components, and be multiplied by a lesser eigenvalue, and it won't be the maximum possible anymore.

### Example

$$A = Q\Lambda Q^T$$
$$A = \left(\begin{array}{c} \vec e_1 & \vec e_2 & \vec e_3 \end{array}\right) \left(\begin{array}{c} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \end{array}\right) \left(\begin{array}{c} \vec e_1^T \\ \vec e_2^T \\ \vec e_3^T \end{array}\right)$$

Suppose $\sigma_1 = 2$, $\sigma_2 = 5$, $\sigma_3 = 4$. Then the largest eigenvalue is $\sigma_2$, and we want to have as much as possible (in fact, all) of our unit vector to be scaled by $\sigma_2$. How do we do that? Well, we choose the unit vector parallel to $\vec e_2$! Thereafter we get

$$v^T A v = v^T Q \Lambda Q^T v$$
With $v = \vec e_2$, we have
$$= \vec e_2^T \left(\begin{array}{c} \vec e_1 & \vec e_2 & \vec e_3 \end{array}\right) \left(\begin{array}{c} 2 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 4 \end{array}\right) \left(\begin{array}{c} \vec e_1^T \\ \vec e_2^T \\ \vec e_3^T \end{array}\right) \vec e_2$$
$$= \left(\begin{array}{c} \vec e_2^T \cdot \vec e_1 & \vec e_2^T \cdot \vec e_2 & \vec e_2^T \cdot \vec e_3 \end{array}\right) \left(\begin{array}{c} 2 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 4 \end{array}\right) \left(\begin{array}{c} \vec e_1^T \cdot \vec e_2 \\ \vec e_2^T \cdot \vec e_2 \\ \vec e_3^T \cdot \vec e_2 \end{array}\right)$$
$$= \left(\begin{array}{c} 0 & 1 & 0 \end{array}\right) \left(\begin{array}{c} 2 & 0 & 0 \\ 0 & 5 & 0 \\ 0 & 0 & 4 \end{array}\right) \left(\begin{array}{c} 0 \\ 1 \\ 0 \end{array}\right)$$

We're successful! Because our choice of $v$ is parallel to $\vec e_2$ (the 2nd eigenvector), their dot product was a perfect 1, and so 100% of the energy of our vector went into the second component, the one that will be multiplied by $\sigma_\textrm{max} = \sigma_2$! We continue:

$$= \left(\begin{array}{c} 0 & 1 & 0 \end{array}\right) \left(\begin{array}{c} 0 \\ 5 \\ 0 \end{array}\right)$$
$$= 5 = \sigma_2$$

We've achieved the maximum gain $G$ possible, $G=\sigma_2=5$!

A similar logic can be applied to the case of the minimum eigenvalue.