How to interpret the adjoint?

Question

Let $V \neq \{\mathbf{0}\}$ be a inner product space, and let $f:V \to V$ be a linear transformation on $V$.

I understand the definition¹ of the adjoint of $f$ (denoted by $f^*$), but I can't say I really grok this other linear transformation $f^*$.

For example, it is completely unexpected to me that to say that $f^* = f^{-1}$ is equivalent to saying that $f$ preserves all distances and angles (as defined by the inner product on $V$).

It is even more surprising to me to learn that to say that $f^* = f$ is equivalent to saying that there exists an orthonormal basis for $V$ that consists entirely of eigenvectors of $f$.

Now, I can follow the proofs of these theorems perfectly well, but the exercise gives me no insight into the nature of the adjoint.

For example, I can visualize a linear transformation $f:V\to V$ whose eigenvectors are orthogonal and span the space, but this visualization tells me nothing about what $f^*$ should be like when this is the case, largely because I'm completely in the dark about the adjoint in general.

Similarly, I can visualize a linear transformation $f:V\to V$ that preserves lengths and angles, but, again, and for the same reason, this visualization tells me nothing about what this implies for $f^*$.

Is there (coordinate-free, representation-agnostic) way to interpret the adjoint that will make theorems like the ones mentioned above less surprising?

¹ The adjoint of $f:V\to V$ is the unique linear transformation $f^*:V\to V$ (guaranteed to exist for every such linear transformation $f$) such that, for all $u, v \in V$,

$$ \langle f(u), v\rangle = \langle u, f^*(v)\rangle \,.$$

+1 for the use of *grok* (see http://en.wikipedia.org/wiki/Grok) - I'd love to this word more often in such a parenthetical use. — Hans-Peter Stricker, Jun 23 '13 at 00:06
@HansStricker: Glad you like *grok*! My original title for the post was "Grokking the adjoint", but afterwards I thought that this title was too confusing... — kjo, Jun 23 '13 at 00:14
As Branimir indicates below, the adjoint is just an abstract version of the transpose. See the Wikipedia articles on [orthogonal matrices](http://en.wikipedia.org/wiki/Orthogonal_matrix) and [symmetric matrixes](http://en.wikipedia.org/wiki/Symmetric_matrix) for concrete versions of some of your other statements. — Jim Belk, Jun 23 '13 at 03:09
Surely you were reading Spivak's *Comprehensive Introduction to Differential Geometry* lately, and it put "grok" on your mind? :) — Tanner Strunk, Dec 16 '17 at 01:05

Branimir Ćaćić · Answer 1 · 2013-06-27T12:13:06.457

For simplicity, let me consider only the finite-dimensional picture. In the infinite-dimensional world, you should consider bounded maps between Hilbert spaces, and the continuous duals of Hilbert spaces.

Recall that an inner product on a real [complex] vector space $V$ defines a canonical [conjugate-linear] isomorphism from $V$ to its dual space $V^\ast$ by $v \mapsto (w \mapsto \langle v,w\rangle)$, where I shamelessly use the mathematical physicist's convention that an inner product is linear in the second argument and conjugate-linear in the first; let us denote this isomorphism $V \cong V^\ast$ by $R$, so that $R(v)(w) := \langle v,w\rangle$.

Now, recall that a linear transformation $f : V \to W$ automatically induces a linear transformation $f^T : W^\ast \to V^\ast$, the transpose of $f$, by $\phi \mapsto \phi \circ f$; all $f^T$ does is use $f$ in the obvious way to turn functionals over $W$ into functionals over $V$, and really represents the image of $f$ through the looking glass, as it were, of taking dual spaces. However, if you have inner products on $V$ and $W$, then you have corresponding [conjugate-linear] isomorphisms $R_V : V \cong V^\ast$ and $R_W : W \cong W^\ast$, so you can use $R_V$ and $R_W$ to reinterpret $f^T : W^\ast \to V^\ast$ as a map $W \to V$, i.e., you can form $R_V^{-1} \circ f^T R_W : W \to V$. If you unpack definitions, however, you'll find that $R_V^{-1} \circ f^T \circ R_W$ is none other than your adjoint $f^\ast$. So, given fixed inner products on $V$ and $W$, $f^\ast$ is simply $f^T$, arguably a more fundamental object, except reinterpreted as a map between your original vector spaces, and not their duals. If you like commutative diagrams, then $f^T$, $f^\ast$, $R_V$ and $R_W$ all fit into a very nice commutative diagram.

As for the specific cases of unitary and self-adjoint operators:

If you want to be resolutely geometrical about everything, the fundamental notion is not the notion of a unitary, but rather that of an isometry, i.e., a linear transformation $f : V \to V$ such that $\langle f(u),f(v)\rangle = \langle u,v\rangle$. You can then define a unitary as an invertible isometry, which is equivalent to the definition in terms of the adjoint. In fact, if you're working on finite-dimensional $V$, then you can check that $f$ is unitary if and only if it is isometric.
In light of the longish discussion above, an operator $f : V \to V$ is self-adjoint if and only if $f^T : V^\ast \to V^\ast$ is exactly the same as $f$ after applying the [conjugate-linear] isomorphism $R_V : V \to V^\ast$, i.e., $f = R_V^{-1} \circ f^T \circ R_V$, or equivalently, $R_V \circ f = f^T \circ R_V$, which you can interpret as commutativity of a certain diagram. That self-adjointness implies all these nice spectral properties arguably shouldn't be considered obvious---at the end of the day, the spectral theorem, even in the finite-dimensional case, is a highly non-trivial theorem in every respect, especially conceptually!

I'm not sure that my overall spiel about adjoints and transposes is all that convincing, but I stand by my statement that the notion of isometry is the more fundamental one geometrically, one that happens to yield the notion of unitary simply out of the very definition of an adjoint, and that the spectral properties of a self-adjoint operator really are a highly non-trivial fact that shouldn't be taken for granted.

Thanks! You write that one can "define *unitary* as an invertible isometry." What confuses me about this is that I don't see how an isometry $f:V\to V$ could fail to be invertible... Such an isometry would have to preserve linear independence, and therefore it would have to be bijective. Am I missing something? Also, did you mean $\langle f(u), f(v) \rangle = \langle u, v \rangle$ (rather than $(f(u), f(v)) = (u, v)$)? — kjo, Jun 27 '13 at 09:01
In the infinite-dimensional case, you can get isometries that fail to be surjective. For instance, on the Hilbert space $\ell^2(\mathbb{N})$ of square-summable sequences $(a_0,a_1,a_2,\dotsc)$, you can define the unilateral shift operator $S$ by $S(a_0,a_1,a_2,\dotsc) := (0,a_0,a_1,a_2,\dotsc)$; $S$ is an isometry, but not surjective, since anything in the range of $S$ is of the form $(0,\dotsc)$. And yes, thanks for noticing that! — Branimir Ćaćić, Jun 27 '13 at 12:10
I feel like this answer concurs with another, less formal, answer I saw here while Googling about this question: https://www.quora.com/What-is-the-significance-of-the-adjoint — Tanner Strunk, Dec 16 '17 at 01:08

score 6 · Answer 2 · edited Apr 11 '18 at 13:05

6

The adjoint allows us to go from an active transformation view of a linear map to a passive view and vice versa. Consider a map $T$ and a vector $u$ and a set of basis covectors $e^i$ for $i \in 1, 2, \ldots$. Given the definition of the adjoint, we have

$$\left[\underline T(u), e^i \right] = \left[u, \overline T(e^i) \right]$$

where the commonly used bracket notation $[x, f]$ means $f(x)$.

On the left, we're actively transforming $u$ to a new vector and evaluating components in some pre-established basis. On the right, we're passively transforming our space to use a new basis and evaluating $u$ in terms of that basis. So for each active transformation, there is a corresponding (and equivalent) passive one.

That said, while I think this can help identify the meaning of the adjoint, I don't see how this helps make intuitive the theorems you described.

edited Apr 11 '18 at 13:05

LMZ

607
3
13

answered Jun 26 '13 at 18:38

Muphrid

18,790
1
23
56

While this is a nice idea, I am afraid that something is not working. If the basis $e_i$ is orthonormal, then in the left hand side of $$\underline{T}(x)\cdot e_i=\overline{T}(e_i)\cdot u$$ you are truly expressing $\underline{T}(u)$ in components with respect to $e_i$. But in the right hand side you are not doing that, because $\overline{T}(e_i)$ needs not be orthonormal. – Giuseppe Negro Jun 26 '13 at 18:48
I've changed to using basis covectors anyway to be more correct; I don't see any need for the basis to be orthonormal then. – Muphrid Jun 26 '13 at 18:57

Christian Blatter · Answer 3 · 2013-06-27T08:02:13.877

It's difficult to give an intuitive description of the adjoint. Note that the adjoint is here even before we have scalar products. It's just that a scalar product allows to interpret the adjoint of a map $A:\ V\to V$ in one and the same space $V$.

A linear map $A:\ V\to W$ from one vector space $V$ to some other vector space $W$ (of any dimensions) produces for each vector $x \in V$ a vector $y:=Ax\in W$.

Assume now that on $W$ a linear function $\phi:\ W\to{\mathbb R}$, i.e., an element of $W^*$, is given, which assigns, e.g., to each point $y\in W$ a temperature value $\phi(y)$, or computes for each $y\in W$ the first coordinate with respect to some basis of $W$. Then the function $$\psi:\quad V\to{\mathbb R},\qquad x\mapsto\phi\bigl(Ax\bigr)$$ computes for each point $x\in V$ the temperature felt at $Ax\in W$, "even before $x$ is actually mapped to $W$". In this way we can regard $\psi$ as a "virtual" temperature distribution on $V$. It is obvious that $\psi$ is a linear function from $V$ to ${\mathbb R}$, i.e., an element of $V^*$.

What we have described here for one $\phi\in W^*$ can of course be done with every $\phi\in W^*$: For each such $\phi$ we shall get a corresponding $\psi\in V^*$. All in all the map $A:\ V\to W$ given at the beginning induces a certain map $$W^*\to V^*, \quad \phi\to\psi\ .$$ This map is called the transpose of $A$ and is denoted by $A^*$. By definition we have the identity $$A^*\phi.x\ =\ \phi. Ax\qquad\forall x\in V,\ \forall\phi\in W^*\ .$$ Here the . means that the linear functional on the left of the dot is applied to the vector on the right of the dot.

An example: Assume that $(e_k)_{1\leq k\leq n}$ is a basis in $V$ and $(f_i)_{1\leq i\leq m}$ is a Basis in $W$. Then $A$ has a certain matrix $[a_{ik}]$ with respect to these bases. Now let $$\phi:=f_1^*:\quad y\mapsto y_1\tag{1}$$ be the functional that computes the first coordinate of any given vector $y\in W$. Then $\phi.Ax$ is the first coordinate $y_1$ of the vector $y:=Ax$. We all know that $$y_1=\sum_{k=1}^n a_{1k} x_k\ .\tag{2}$$ Since we can write $x_k$ as $x_k=e_k^*.x$ we can interpret $(2)$ as $$A^*\phi.x=\phi.Ax=\sum_{k=1}^n a_{1k} e_k^*.x\qquad(x\in V)\ ,$$ or, given the definition $(1)$ of $\phi$: $$A^* f_1^*=\sum_{k=1}^n a_{1k} e_k^*\ .$$

score 0 · Answer 4 · answered Jun 26 '13 at 19:38

Here's a bit of intuition for the spectral theorem. Suppose $V$ is a finite dimensional inner product space over $\mathbb C$ and $T:V \to V$ is self-adjoint. It's easy to show that all the eigenvalues of $T$ are real and that eigenvectors corresponding to distinct eigenvalues are orthogonal.

Typically all the eigenvalues of $T$ are distinct. (It's in some sense an extraordinary coincidence if two roots of the characteristic polynomial coincide.) In this case, we see immediately that there is an orthonormal basis of eigenvectors for $T$.

If not all eigenvectors of $T$ are distinct, we can perturb $T$ slightly to obtain a self-adjoint transformation $\tilde{T}$ which does have an orthonormal basis of eigenvectors. By considering a sequence of slight perturbations $\tilde{T}_k$ which converges to $T$, we can hope to obtain a sequence of orthonormal bases which will converge to an orthonormal basis of eigenvectors of $T$.

How to interpret the adjoint?

4 Answers4

Linked