194

I can follow the definition of the transpose algebraically, i.e. as a reflection of a matrix across its diagonal, or in terms of dual spaces, but I lack any sort of geometric understanding of the transpose, or even symmetric matrices.

For example, if I have a linear transformation, say on the plane, my intuition is to visualize it as some linear distortion of the plane via scaling and rotation. I do not know how this distortion compares to the distortion that results from applying the transpose, or what one can say if the linear transformation is symmetric. Geometrically, why might we expect orthogonal matrices to be combinations of rotations and reflections?

Rodrigo de Azevedo
  • 18,977
  • 5
  • 36
  • 95
Zed
  • 1,941
  • 3
  • 12
  • 3
  • 14
    the geometry is in the the inner product say on $\mathbb{R}^n$. the transpose satisfies $\langle Ax,y \rangle=\langle x,A^ty \rangle$. orthogonal matrices satisfy $\langle x,y \rangle=\langle Ax,Ay \rangle$, they preserve the geometry. – yoyo May 06 '11 at 12:21
  • See [this answer](http://math.stackexchange.com/questions/598258/determinant-of-transpose/636198#636198) for a geometric description of the transpose. – Matt Jan 12 '14 at 21:19

4 Answers4

100

To answer your second question first: an orthogonal matrix $O$ satisfies $O^TO=I$, so $\det(O^TO)=(\det O)^2=1$, and hence $\det O = \pm 1$. The determinant of a matrix tells you by what factor the (signed) volume of a parallelipiped is multipled when you apply the matrix to its edges; therefore hitting a volume in $\mathbb{R}^n$ with an orthogonal matrix either leaves the volume unchanged (so it is a rotation) or multiplies it by $-1$ (so it is a reflection).

To answer your first question: the action of a matrix $A$ can be neatly expressed via its singular value decomposition, $A=U\Lambda V^T$, where $U$, $V$ are orthogonal matrices and $\Lambda$ is a matrix with non-negative values along the diagonal (nb. this makes sense even if $A$ is not square!) The values on the diagonal of $\Lambda$ are called the singular values of $A$, and if $A$ is square and symmetric they will be the absolute values of the eigenvalues.

The way to think about this is that the action of $A$ is first to rotate/reflect to a new basis, then scale along the directions of your new (intermediate) basis, before a final rotation/reflection.

With this in mind, notice that $A^T=V\Lambda^T U^T$, so the action of $A^T$ is to perform the inverse of the final rotation, then scale the new shape along the canonical unit directions, and then apply the inverse of the original rotation.

Furthermore, when $A$ is symmetric, $A=A^T\implies V\Lambda^T U^T = U\Lambda V^T \implies U = V $, therefore the action of a symmetric matrix can be regarded as a rotation to a new basis, then scaling in this new basis, and finally rotating back to the first basis.

Chris Taylor
  • 27,485
  • 5
  • 79
  • 121
  • 4
    You are probably talking about the singular value decomposition ("this makes sense even if $\mathbf A$ is not square!"), not the eigendecomposition. – J. M. ain't a mathematician May 06 '11 at 12:24
  • Oops, I originally was going to write about the eigendecomposition and then decided to write about svd instead, but forgot to change my wording. Will edit now - thanks! – Chris Taylor May 06 '11 at 12:37
  • 1
    Now, another thing: the phrase is supposed to be "if $A$ is square and *symmetric*..." – J. M. ain't a mathematician May 06 '11 at 12:59
  • "The action of a symmetric matrix can be regarded as a rotation to a new basis": I am confused by this statement. Rotation matrices don't have any eigenvectors and symmetric matrices have orthogonal eigenvectors, hence a symmetric matrix cannot be a rotation matrix. – Lenar Hoyt Jan 05 '16 at 00:29
  • @mcb You need to read the rest of the sentence - "can be regarded as a rotation to a new basis, then scaling in this new basis, and finally rotating back to the new basis". The effect of the two rotations cancels out, and the net effect is just a scaling (the magnitudes of the scaling are described by the eigenvalues of the symmetric matrix). – Chris Taylor Jan 05 '16 at 11:48
  • 5
    Why does $V\Lambda^T U^T = U\Lambda V^T \implies U = V $? I assume it's easy to prove, as it suffices to show $\Lambda = Q\Lambda Q \implies Q = I$ for any orthogonal real matrix $Q$, but I don't quite see how to carry out the proof immediately. – Detached Laconian May 14 '18 at 20:03
  • Brilliant response! – Christian Aug 24 '18 at 09:46
  • in order to interpret the geometric meaning of all square matrices, the background better be in homogeneous coordinates and in projective geometry. – user6043040 Nov 03 '19 at 11:46
  • Still not explain what would be the difference between V^T (the first rotation for A) and V (the last rotation of A^T) – Jason Oct 25 '20 at 23:57
  • 1
    @DetachedLaconian It is actually not, there are some degrees of freedom. See https://math.stackexchange.com/a/644397/99220 – Hyperplane Jan 15 '21 at 13:19
22

yoyo has succinctly described my intuition for orthogonal transformations in the comments: from polarization you know that you can recover the inner product from the norm and vice versa, so knowing that a linear transformation preserves the inner product ($\langle x, y \rangle = \langle Ax, Ay \rangle$) is equivalent to knowing that it preserves the norm, hence the orthogonal transformations are precisely the linear isometries.

I'm a little puzzled by your comment about rotations and reflections because for me a rotation is, by definition, an orthogonal transformation of determinant $1$. (I say this not because I like to dogmatically stick to definitions over intuition but because this definition is elegant, succinct, and agrees with my intuition.) So what intuitive definition of a rotation are you working with here?

As for the transpose and symmetric matrices in general, my intuition here is not geometric. First, here is a comment which may or may not help you. If $A$ is, say, a stochastic matrix describing the transitions in some Markov chain, then $A^T$ is the matrix describing what happens if you run all of those transitions backwards. Note that this is not at all the same thing as inverting the matrix in general.

A slightly less naive comment is that the transpose is a special case of a structure called a dagger category, which is a category in which every morphism $f : A \to B$ has a dagger $f^{\dagger} : B \to A$ (here the adjoint). The example we're dealing with here is implicitly the dagger category of Hilbert spaces, which is relevant to quantum mechanics, but there's another dagger category relevant to a different part of physics: the $3$-cobordism category describes how space can change with time in relativity, and here the dagger corresponds to just flipping a cobordism upside-down. (Note the similarity to the Markov chain example.) Since relativity and quantum mechanics are both supposed to describe the time evolution of physical systems, it's natural to ask for ways to relate the two dagger categories I just described, and this is (roughly) part of topological quantum field theory.

The punchline is that for me, "adjoint" is intuitively "time reversal." (Unfortunately, what this has to do with self-adjoint operators as observables in quantum mechanics I'm not sure.)

Qiaochu Yuan
  • 359,788
  • 42
  • 777
  • 1,145
  • 1
    I do think that it has to be proved that rotations really are rotations in two and three dimensions (i.e. finding the rotation axes etc). An important remark for the OP that is especially easy to see in your post, is, that contrary to scalings, the transpose is most naturally viewed as an application somewhere else. – Phira May 06 '11 at 16:07
  • 2
    In what sense does "adjoint" intuitively correspond to "time reversal"? Would it be possible to make sense of this intuition without referring to category theory? – Elliott Jun 01 '11 at 23:05
  • @Elliott: that's exactly what I tried to explain in the paragraph above that, as well as the comment about Markov chains. – Qiaochu Yuan Jun 01 '11 at 23:09
  • 1
    I think what I really meant is, "I didn't understand this very well, can you please elaborate in a more elementary way?" but I'll make another attempt to understand what you said before asking you again. – Elliott Jun 01 '11 at 23:17
  • 5
    @Elliott: the Markov chain example can be understood as follows. Let's say we have a collection of boxes joined by tubes. In these boxes there are particles running around, and let's say each tube has a fixed direction in which it passes particles. The directions of the tubes determine a transition matrix (in a loose sense) describing the possible directions the particles can travel, and taking the transpose of this matrix corresponds to reversing the allowed direction of each tube. What does this have to do with time reversal? Well, if you ran a movie of the particles traveling in reverse... – Qiaochu Yuan Jun 01 '11 at 23:26
  • in order to interpret the geometric meaning of all square matrices, the background better be in homogeneous coordinates and in projective geometry. – user6043040 Nov 03 '19 at 11:47
12

Overview If a matrix A acting on vectors tells you how a vector is transformed, the matrix $A^T$ tells you how linear measurement of this vector are transformed.

If E is your vector space, the "space of linear measurements", or dual space is the space of all linear transformation $E \rightarrow \mathbb{R}$. Said differently, it is the space of all linear functions that take in a vector and output a number.

Example Let's work in $\mathbb{R}^2$.

  • We have a generic vector $X = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}$
  • A sample matrix $A = $$ \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} $
  • A sample linear measurement $h(X)=2x_1$, we note it $\begin{pmatrix} 2 \\ 0 \end{pmatrix}$

Maybe this measurement has some meaning physically (total mass or momentum...). We want to know how the transformation A affects h.

The familiar product $AX = \begin{pmatrix} x_1 + x_2 \\ x_2 \end{pmatrix}$ tells you how X is transformed. Now that we have seen that X has changed we might ask, what is the new measurement of X by h? It is obtained by computing $h(AX)$:

$h(Ax) = h(\begin{pmatrix} x_1 + x_2 \\ x_2 \end{pmatrix}) = 2x_1 + 2x_2$ which we note as $\begin{pmatrix} 2 \\ 2 \end{pmatrix}$

So when we transform by A, h goes from $\begin{pmatrix} 2 \\ 0 \end{pmatrix}$ to $\begin{pmatrix} 2 \\ 2 \end{pmatrix}$

It turns out that this transformation is precisely what the transpose is doing, as you can verify: $A^T\begin{pmatrix} 2 \\ 0 \end{pmatrix} = \begin{pmatrix} 2 \\ 2 \end{pmatrix}$

This is kind of hand-wavy, if you want more details look up dual space, which makes all of this precise.

Manu
  • 141
  • 1
  • 3
  • 2
    How does this apply if $A$ is not a square matrix? i.e. Suppose $A$ sends $m$-dimensional vectors to $n$-dimensional vectors. On the other hand, $A^T$ sends $n$-dimensional vectors to $m$-dimensional vectors. Using the letters from your example, $h$ is a linear functional over an $n$-dimensional vector space, which wouldn't be able to measure $X$, which is a vector in an $m$-dimensional vector space. – Frank Dec 06 '20 at 05:26
  • Yes the transpose maps a measurement on the output space to one on the input space. Maybe a way to interpret this intuitively is that A might map X into a very high dimensional space (like mapping a function to its Fourier spectrum) and we might care only about a little part of the spectrum (say the first harmonic). This little part that we care about is our h, it is a measurement on the output space. What we can do is apply A and then apply h, but that is a lot of work because we need to compute the whole spectrum. Instead we can use $A^Th$ to get the first harmonic from X directly. – Manu Dec 14 '20 at 04:02
  • 1
    The key equality is: $A^Th(v) = h(Av), \forall v \in R^m, h \in R^{n*}$ – Manu Dec 14 '20 at 04:03
1

We better interpret the geometric meaning of transpose from the view point of projective geometry. Because only in projective geometry, it is possible to interpret that of all square matrices.

It would be difficult for OP to understand and grasp the necessary premises or mathematical basis when interpreting the meaning in projective geometry of a square matrix's transpose only from a not too much lengthy answer posted here. So I would recommend you to read the article in this link first.

After read article in the above link, use the basic principles below:

  1. Some basic or elementary geometric transformations are equivalent to Househodler's elementary matrices in homogeneous coordinates in projective geometry; that is what stereohomology is defined;

  2. All square matrices can be represented by concatenated multiplication of elementary matrices;

  3. a transpose of any elementary geometric transformation means exchanging the "center" and the "interface" of it per the definition of stereohomology.

then: the transpose of any square matrices have their own geometric meaning in projective geometry, though for complicated cases, the geometric interpretation of transpose might not be unique.

user6043040
  • 662
  • 3
  • 11