In the last month I studied the spectral theorems and I formally understood them. But I would like some intuition about them. If you didn’t know spectral theorems, how would you come up with the idea that symmetric/normal endomorphisms are the only orthogonally diagonalizable endomorphisms in the real/complex case. How would you even come up with the idea of studying the adjoint?

  • 33,275
  • 3
  • 17
  • 47
Francesco Scavella
  • 3,585
  • 1
  • 7
  • 26
  • 2
    This is a bit of stupid answer but: the first intuition, when you learn about eigenvalues and eigenvectors, is of course that every map is diagonalizable. Then understanding and getting intuition about why and how this first intuition fails is quite a challenge but it falls outside the scope of your question. But if you have struggled with that long enough you have probably also seen Jordan normal forms at some point. And then one can wonder: in what way do diagonal matrices look different from non-diagonalizable matrices in Jordan form? Obviously the answer is: they are their own transpose! – Vincent Jul 19 '20 at 20:05
  • 1
    two remarks: the spectral theorems started with Hilbert who also named them like this. Now, afterwards it was found that the physical spectra of atoms could be explained by it (this is a pure coincidence!). I also share your concern, and I believe if the singular value decomposition (SVD) would have been know first (and didnt need in the proof the spectral decompositon), nobody would have bothered with self-adjoint operators, since the SVD gives a very clear picture of operators (also much better than the Jordan form). Jordan form, and S.t. is basically relevant if you treat it as an algebra. – lalala Jul 19 '20 at 22:43
  • 1
    you've been really kind, but I haven't studied Jordan normal form(I was planning to study it). Maybe I'll answer you after I know a bit more about it. – Francesco Scavella Jul 20 '20 at 06:18
  • 1
    @lalala: that's a bold statement. The SVD has no obvious generalization to non-compact operators, so you are saying that the work of the last 90 years on unbounded operators, on C$^*$-algebras, on von Neumann algebras, and related subjects was misguided? – Martin Argerami Jul 20 '20 at 08:09
  • My guess would be that symmetric matrices would have come naturally out of the study of quadratic forms, e.g. in the context of differential geometry. – Ben Grossmann Jul 20 '20 at 19:38

3 Answers3


Regarding the adjoint, suppose you have vectors spaces $X$ and $Y$ (over the same field), and a linear map $$ T:X\to Y $$ Write $X^*$ and $Y^*$ for the dual spaces. Then $T$ naturally induces a map $$ T^*:Y^* \to X^* $$ defined by $$ T^*(\phi):=\phi\circ T $$ This makes sense, because if $\phi$ is a linear functional on $Y$, then $\phi\circ T$ is a linear functional of $X$. Moreover, the function $T^*$ is also a linear transformation. This $T^*$ is called the adjoint of $T$ (there is a slight abuse of notation/terminology here, I'll elaborate on this in a moment). This is an example of what is called functorial behaviour. Taking adjoints is an example of what is called a contravariant functor.

Now, suppose that $X$ and $Y$ are finite-dimensional inner product spaces. Then you know that $X$ and $X^*$ can be canonically identified with each other. On the one hand, any $x\in X$ gives rise to a linear functional $\phi_x\in X^*$ defined by $$ \phi_x(v):=\langle v,x\rangle $$ Write $S_X:X\to X^*$ for the map that sends $x$ to $\phi_x$. It is easy to verify that $S_X$ is conjugate linear, i.e. $S_X(x+x')=S_X(x)+S_X(x')$ and $S_X(\alpha x)=\bar \alpha S_X(x)$.

On the other hand, given any $\phi\in X^*$, one can show that there exists (a unique) vector $x_\phi\in X$ such that, for every $v\in X$, $$ \phi(v)=\langle v, x_\phi\rangle $$ This shows that the function $S_X$ above is invertible, so it is "almost" an isomorphism, except for the fact that it is not strictly linear, but conjugate linear.

Now, the same thing can be done with $Y$, and we obtain a conjugate isomorphism $S_Y:Y\to Y^*$.

Consider now the composition $$ Y\overset{S_Y}{\longrightarrow} Y^*\overset{T^*}{\longrightarrow} X^* \overset{S^{-1}_X}{\longrightarrow} X $$ Call this composition $\hat T$, i.e. $\hat T(y)=(S^{-1}_X\circ T^*\circ S_Y)(y)$. You can check that $\hat T$ is linear.

Fix $x\in X$ and $y\in Y$. Put $\phi=(T^*\circ S_Y) y\in X^*$. Now, $S_X^ {-1}\phi$ is, by definition, the unique vector $z\in X$ such that $\langle v,z\rangle =\phi (v)$ for every $v\in X$. Therefore, $$ \langle x,\hat Ty\rangle =\langle x,S^{-1}_X\phi\rangle=\phi(x) $$ Now, $\phi=T^*(S_Yy)=(S_Yy)\circ T$. So, $$ \phi(x)=(S_Yy)(Tx) $$ Now, $S_Yy\in Y^*$ is the linear functional which right multiplies a vector in $Y$ by $y$. This means that $$ (S_Yy)(Tx)=\langle Tx,y\rangle $$ Putting everything together, we get that $$ \langle x,\hat Ty\rangle =\langle Tx,y\rangle $$ So, $\hat T$ has the property that "the adjoint" has in every linear algebra text. In practice, we use $T^*$ to refer to the above $\hat T$, and the original $T^*$ is left behind. I will be following this convention from now on, i.e. all $T^*$ in what follows really means $\hat T$. I should mention that having an inner product is key for all of this. For general vector spaces $X$ need not be isomorphic to $ X^*$.

Regarding your question about looking at normality, recall that, given a linear operators $T:X\to X$, a subspace $W\subset X$ is said to be $T$-invariant if $$ x\in W\implies Tx\in W $$ Define the orthogonal complement $$ W^\perp:=\{x\in X: \forall w\in W\langle x,y\rangle =0\} $$ Note that, if $W$ is $T$-invariant, then $W^\perp$ is $T^*$-invariant. Indeed, fix $x\in W^\perp$. We need to see that $T^*x\in W^\perp$. Let $w\in W$, then $$ \langle T^*x,w\rangle=\langle x,Tw\rangle=0 $$ because $x\in W^\perp$ and $Tw\in W$ (because $W$ is $T$-invariant). Since $w\in W$ was arbitrary, $T^*x\in W^\perp$.

If $T$ is, for example, self-adjoint, then we obviously have that a $W^\perp$ is $T$-invariant. This leads to the following question: can we find an easy property for an operator $T$ so that it satisfies that every $T$-invariant subspace has a $T$-invariant orthogonal complement? The answer to this question is yes, and the property is normality, see here.

How does this relate to being diagonalizable? Well, since the matrix of $T^*$ in the basis $B$ is the conjugate transpose of the matrix of $T$ in the basis $T$, it follows that any diagonalizable operator is necessarily normal.

Suppose now that $T$ is normal. Pick an eigenvalue $\lambda$ of $T$. Let $E$ be the associated eigenspace. Clearly, $E$ is $T$-invariant. Write $$ X=E\oplus E^\perp $$ By normality, $E^\perp$ is also $T$-invarint. This means that we can consider the restricted operator $T|_{E^\perp}:E^\perp \to E^\perp$. This new operator is also normal. But $\dim (E^\perp)<\dim X$, and we can carry out an inductive argument.

  • 12,676
  • 1
  • 16
  • 36
  • Thank you! I already knew that way of defining the adjoint(actually it's the approach of the book that I'm using). Since we've to decompose the space into orthogonal eigenspaces(that are f-invariant) it's pretty natural to study when an endomorphism that "fixes" a subspace "fixes" also its orthogonal complement(since the sum of f-invariants is f-invariant). So if I take $w \in W$ and $w'$ in its orthogonal complement, it's pretty natural to require this "switch": $$\langle f(w'),w \rangle =\langle w',f(w) \rangle=0$$ – Francesco Scavella Jul 20 '20 at 05:59
  • And then one can ask theirselves if in general there is an application that makes this "switch" possible and study the adjoint. – Francesco Scavella Jul 20 '20 at 06:03

Almost everything about this subject was derived in the opposite order of what you have been taught. That's why it is difficult to answer your question.

  • The infinite-dimensional case was studied for functions before the finite-dimensional case, and well before the notion of a vector space.

  • Orthogonality was noticed and defined using integral conditions about 150 years before an inner product was defined, and before finite-dimensional Linear Algebra. These observations led to the notion of a general inner product space.

  • Linearity came out of the physical condition of superposition of solutions for the Heat Equation and vibrating string problem, not the other way around.

  • Self-adjoint was defined before there was an inner-product, through Lagrange's adjoint equation, which gave, among other things, a reduction of order tool for ODEs, and a notion of "integral orthogonality."

It's all upside down from the point-of-view of abstraction. Asking how you might start at the lowest level of abstraction and naturally move toward the more abstract direction is asking how to motivate the backwards direction from the Historical forward direction that brought us to this point. It wasn't derive that way, and might never have been.

Disintegrating By Parts
  • 79,842
  • 5
  • 49
  • 126
  • Thank you, it was really interesting to have some historical background on this theorem. Nevertheless I think that sometimes it's useful to build up a modern intuition about concepts and not an historical one. Some teachers are really good at making the math "flow naturally" , but I'm studying this particular topic as autodidact, so sometimes it seems like the concepts came out of nowhere. An example is Lagrange theorem in group theory. Historically it was born for groups of permutations, but I think that is more intuitive to look at it as a natural consequence of the study of congruences. – Francesco Scavella Jul 20 '20 at 06:10
  • In a group(and my algebra teacher really underlined the importance of congruences in algebraic structures). Another example is the determinant whom history is pretty messy and non-linear(paradoxically), but I find the modern approach of seeing it as an hypervolume of a parallelotope to be really intriguing and super-interesting. – Francesco Scavella Jul 20 '20 at 06:13
  • 5
    It's inaccurate to say the infinite-dimensional case was studied before the finite-dimensional case. Aspects of the spectral theorem for real symmetric matrices (not necessarily using the term "matrix") were studied in the early 1800s by Cauchy in his work on the principal axis theorem. See Section 4.4 of "The Mathematics of Frobenius in Context" by Hawkins or Steen's "Highlights in the History of Spectral Theory". I agree that the infinite-dimensional case was an important motivation for the development that led to the standard formulation of the finite-dimensional case. – KCd Jul 20 '20 at 15:54
  • 1
    @KCd : You are wrong. The Infinite-dimensional case started with the vibrating string problem around 1750. Euler, Bernoulli, Clairaut, and Fourier were involved before 1800. – Disintegrating By Parts Jul 20 '20 at 16:11
  • 2
    I said *aspects*, not everything. The first proof that (in today's language) all eigenvalues of a real symmetric matrix are real goes back to the work of Cauchy on the principal axis theorem. The recognition that there is a connection between reality of eigenvalues and symmetry goes back earlier to work of Laplace on differential equations in celestial mechanics, not to work on vibrating strings. – KCd Jul 20 '20 at 17:35
  • @KCd "The basic concepts of spectral theory: eigenvalues, eigenfunctions, and expansions in a series of such functions were already known at the beginning of the 19th century, in the theory of Fourier series; they would form the model on which all further advances were patterned." - Jean Dieudonne "History of Functional Analysis". The vibrating string was the first problem of this type, and the idea of expanding in a trigonometric series was highly controversial, with Fourier on one side of the debate and almost everyone else on the other side. Cauchy studied this work. This work was seminal. – Disintegrating By Parts Jul 20 '20 at 18:08

To give a bit of a shorter answer, in the hermitian case observe that, if both $x$ and $y$ are both eigenvectors of $A$, corresponding to the eigenvalues $\lambda$ and $\mu$, then:

$$\begin{aligned} &\langle Ax, y \rangle = \langle \lambda x, y \rangle = \lambda \langle x, y \rangle \\ &\quad= \\ &\langle x, A^*y \rangle = \langle x, A y \rangle =\langle x, \mu y \rangle = \overline\mu \langle x, y \rangle \end{aligned}$$

Hence, $(\lambda -\overline\mu) \langle x, y \rangle =0$ implying either $\lambda=\overline\mu$ or $x\perp y$. Choosing $x=y$ we find that $\lambda=\overline\lambda$, so all eigenvalues must be real. Consequently, the eigenspaces corresponding to different eigenvalues are orthogonal to each other.

From this observation alone, lots of consequences follow quite naturally. One can easily prove that in this case a full orthogonal basis exists (see e.g. this writeup or try for yourself); likewise, if an orthonormal eigenbasis corresponding to real eigenvalues exists one can easily prove that $A$ must be hermitian.

The normal case is a bit more tricky, but one can play a similar game (may expand later).

  • 10,099
  • 1
  • 16
  • 42