It is well known that for invertible matrices $A,B$ of the same size we have $$(AB)^{-1}=B^{-1}A^{-1} $$ and a nice way for me to remember this is the following sentence:

The opposite of putting on socks and shoes is taking the shoes off, followed by taking the socks off.

Now, a similar law holds for the transpose, namely:

$$(AB)^T=B^TA^T $$

for matrices $A,B$ such that the product $AB$ is defined. My question is: is there any intuitive reason as to why the order of the factors is reversed in this case?

[Note that I'm aware of several proofs of this equality, and a proof is not what I'm after]

Thank you!

Rodrigo de Azevedo
  • 18,977
  • 5
  • 36
  • 95
  • 23,442
  • 6
  • 51
  • 136
  • 19
    The transpose identity holds just as well when $A$ and $B$ are not square; if $A$ has size $m \times n$ and $B$ has size $n \times p$, where $p \neq m$, then the given order is the only possible order of multiplication of $A^T$ and $B^T$. – Travis Willse May 13 '15 at 06:48
  • 1
    @Travis I never required the matrices to be square for the transpose identity: all I said was that the product $AB$ must be defined. – user1337 May 13 '15 at 06:53
  • 8
    @user1337, I think Travis was just using non-square matrices as a way of seeing why we must have $(AB)^T = B^T A^T$ and not $A^TB^T$. If $A$ is $l \times m$ and $B$ is $m \times n$ then $AB$ makes sense and $B^T A^T$ is an $n \times m$ times a $m \times l$ which makes sense, but $A^T B^T$ doesn't work. – Jair Taylor May 13 '15 at 06:58
  • 3
    @user1337 Jair's right, I didn't intend my comment as a sort of correction, just as an explanation that if there is some identity for $(AB)^T$ of the given sort that holds for all matrix products, matrix size alone forces a particular order. (BTW, Jair, I finished my Ph.D. at Washington a few years ago.) – Travis Willse May 13 '15 at 08:05
  • @Travis: Ah, I think I know who you are. I'm working with Sara Billey. I like UW a lot. :) – Jair Taylor May 13 '15 at 15:33
  • 2
    Obligatory remark: the "socks and shoes" metaphor is due to Coxeter (*An Introduction to Geometry*, 2/e, p.33). – user1551 Mar 30 '19 at 15:14

13 Answers13


One of my best college math professor always said:

Make a drawing first.

enter image description here

Although, he couldn't have made this one on the blackboard.

  • 2,044
  • 1
  • 9
  • 6
  • 7
    This is an absolutely beautiful explanation. – Roger Burt May 13 '15 at 19:32
  • 1
    Couldn't believe it can be explained in such a simple and fun way. Must plus one! – Vim May 14 '15 at 09:00
  • 4
    This is the most trite answer i have ever seen. I can't decide whether it deserves an upvote or a downvote. – Christian Chapman May 16 '15 at 09:16
  • 9
    (+1) @enthdegree: An upvote, particularly if the drawing were augmented to indicate the $i$th row of $A$, the $j$th column of $B$, and the $(i, j)$ entry of $AB$ (all in marker heavy enough to bleed through the paper, of course), so that when the paper is flipped the OP's question is immediately answered with justification. :) – Andrew D. Hwang May 16 '15 at 11:56
  • 3
    Sparsity in this drawing is by design. The more clutter you add, the more confuse it becomes. Readers familiar with matrix multiplication will probably have drawn this two-faced $(i,j)$ double-line in their mind. – mdup May 17 '15 at 10:51
  • @mdup. +1. It did take me a while to understand this gif but when I finally figured it out it was just...incredible! – Vim May 18 '15 at 04:21
  • @mdup - this is 100% exactly what I was looking for. I wish more people like you taught mathematics! –  Aug 30 '16 at 04:17
  • 2
    @AndrewD.Hwang Your comment helped to quickly understand the gif. It should have been a part of answer. – Akki Oct 08 '17 at 08:56
  • 9
    Can anyone explainme the picture ? – J. Deff Mar 30 '19 at 12:46
  • The image link is broken. – Conifold Dec 18 '19 at 01:05
  • That is amazing response – maxadorable May 23 '20 at 01:33
  • I giggled for a quarter of a minute on seeing this explanation. May whoever comes across this gif like me have a very nice day. – BeBlunt May 24 '21 at 20:17

By dualizing $AB: V_1\stackrel{B}{\longrightarrow} V_2\stackrel{A}{\longrightarrow}V_3$, we have $(AB)^T: V_3^*\stackrel{A^T}{\longrightarrow}V_2^*\stackrel{B^T}{\longrightarrow}V_1^*$.

Edit: $V^*$ is the dual space $\text{Hom}(V, \mathbb{F})$, the vector space of linear transformations from $V$ to its ground field, and if $A: V_1\to V_2$ is a linear transformation, then $A^T: V_2^*\to V_1^*$ is its dual defined by $A^T(f)=f\circ A$. By abuse of notation, if $A$ is the matrix representation with respect to bases $\mathcal{B}_1$ of $V_1$ and $\mathcal{B}_2$ of $V_2$, then $A^T$ is the matrix representation of the dual map with respect to the dual bases $\mathcal{B}_1^*$ and $\mathcal{B}_2^*$.

Alex Fok
  • 4,636
  • 11
  • 22
  • 10
    In other words: the dualizing functor $V \rightarrow V^*$ is contravariant. – Jair Taylor May 13 '15 at 06:47
  • I think you might elaborate though, explaining what $V^*$ is and what $A^T$ means in this context. Otherwise it seems more like a comment than an answer. – Jair Taylor May 13 '15 at 06:50
  • Actually I was trying to make my answer as concise as possible as the OP does not looking for a proof. Sure I could have been more precise. – Alex Fok May 13 '15 at 06:54
  • @AlexFok looks neat, however I don't know what dualizing means. Can you please elaborate? – user1337 May 13 '15 at 06:56
  • Just added some more explanation in the answer. Hope that helps. – Alex Fok May 13 '15 at 07:04
  • 5
    I wish it would be emphasised more in teaching that transposing makes linear operators switch to work on the dual spaces. Many people – at least in science, not sure how it is in maths – aren't aware of this at all; I was never explicitly taught about it and it was a huge a-ha moment when I first found out. – leftaroundabout May 13 '15 at 20:01

Here's another argument. First note that if $v$ is a column vector then $(Mv)^T = v^T M^T$. This is not hard to see - if you write down an example and do it both ways, you will see you are just doing the same computation with a different notation. Multiplying the column vector $v$ on the right by the rows of $M$ is the same as multiplying the row vector $v^T$ on the left by the columns of $M^T$.

Now let $( \cdot , \cdot )$ be the usual inner product on $\mathbb{R}^n$, that is, the dot product. Then the transpose $N = M^T$ of a matrix $M$ is the unique matrix $N$ with the property

$$(Mu, v) = (u, Nv).$$

This is just a consequence of associativity of matrix multiplication. The dot product of vectors $u,v$ is given by thinking of $u,v$ as column vectors, taking the transpose of one and doing the dot product: $(u,v) = u^T v$.

Then $(Mu,v) = (Mu)^T v = (u^T M^T) v = u^T (M^Tv) = (u, M^Tv)$.

Exercise: Show uniqueness!

With this alternate definition we can give a shoes-and-socks argument. We have

$$( ABu, v) = (Bu, A^Tv) = (u, B^TA^Tv)$$

for all $u,v$, and so $(AB)^T = B^T A^T$. The argument is exactly the same as the one for inverses, except we are "moving across the inner product" instead of "undoing".

Jair Taylor
  • 14,290
  • 5
  • 32
  • 54
  • 3
    The second part is the best way of looking at this. The point is that the transpose is not really that natural of an operation by itself: it is important because it is the adjoint operation for the (real) Euclidean dot product. And the adjoint operation for *any* inner product has the property in question, for the same reason that the inverse operation has the property in question. – Ian May 13 '15 at 15:22

Each element of the matrix $AB$ is the inner product of a row of $A$ with a column of $B$.

$(AB)^T$ has the same elements that $AB$ does (just in different places), so its elements too must each come from a row of $A$ and a column of $B$.

However if we want to start with $A^T$ and $B^T$, then a row of $A$ is the same thing as a column of $A^T$ (and vice versa for $B$ and $B^T$), so we need something that has columns of $A^T$ and rows of $B^T$. The matrix that we take columns from is always the right factor, to $A^T$ must be the right factor in the multiplication.

Similarly, $B^T$ must be the left factor because we need its rows (which are columns of the original $B$).

hmakholm left over Monica
  • 276,945
  • 22
  • 401
  • 655
  • Your first sentence begins by relating all the entries of $AB$ to the inner product. Considering how the inner product is the 'mantle centerpiece' of modern/abstract vector/Hilbert space theory, this is where to look for any 'intuitive insights'. (+1) – CopyPasteIt Jun 12 '19 at 14:24
  • You know for sure (combinatorics/counting) that the entries in $B^t A^t$ can be matched up $1:1$ with the entries of $(AB)^t$. Surely scrambled eggs is not what we expect! So check that any two matrix entries actually agree - intuition morphing into a complete proof. – CopyPasteIt Jun 12 '19 at 14:45
  • Managed to find a recent question to put write this up https://math.stackexchange.com/a/3259932/432081 – CopyPasteIt Jun 12 '19 at 15:27

A matrix is a collection of entries that may be represented with 2 indices. When we multiply two matrices, each resultant entry is the sum of the products

$$C_{ik} = \sum_j A_{ij} B_{jk} $$

Crucially, the 'middle' index, $j$, must be the same for both matrices (the first must be as wide as the second is tall).

A transpose is just a reversal of indices:

$$A_{ij}^T = A_{ji}$$

It should now go without saying that

$$C_{ik}^T = C_{ki} = (\sum_j A_{ij} B_{jk})^T = \sum_j B_{kj} A_{ji}$$

Memory shortcut: multiplication fails immediately for non-square matrices when you forget to commute for a transpose.

  • 735
  • 4
  • 12

the intuitive reason is that the entries of a product matrix are feynman path integrals, and transposing the matrixes corresponds simply to reversing the arrow of time for traveling along the paths.

(so it's practically the same idea as in your shoes-and-socks example: matrix transposition is about time-reversal, just like function inversion is about time-reversal.)

the (i,k)th entry in a product matrix ab is the sum over j of a(i,j).b(j,k). in other words, it's a sum over all "2-step paths" (i,j,k) from i to k, each path visiting one intermediate point j on its way from i to k.

this sum over paths is called a "feynman path integral". if you read feynman's original paper on the subject, focusing on the parts that are easy to understand, you'll see that that was feynman's basic message: that whenever you have a big long string of matrixes to multiply, each entry in the product matrix is a "sum over paths" aka "path integral", with the contribution of each particular path being a long product of "transition quantities", each associated with one transition-step along the path.

this "path" interpretation of matrix multiplication actually gets more intuitive for longer strings of matrixes, because then each path consists of many steps. for example each entry of a matrix product abc...z is a sum over 26-step paths; each path visits 27 points but with just 26 transition-steps from one point to the next.

  • 3
    I very much doubt that this explanation will help the OP, but I found the analogy to Feynman path integrals told me something important about the path integrals. I'm not a physicist and never looked past a paragraph or two about them. Now I can see that they resemble counting paths in a graph by looking at powers of the adjacency matrix. – Ethan Bolker May 14 '15 at 14:45
  • 2
    I don't think we really need to attach Feynman's name here, but in general this combinatorial view of matrix multiplication as a sum over walks is very helpful. The time-reversal interpretation.is a pretty useful way of looking at the transpose. – Jair Taylor May 15 '15 at 01:04

$\hspace{3cm}$enter image description here

Turn (transpose) to the street $B^T$ perpendicular $B$, then turn (transpose) to $A^T$ perpendicular $A$.

  • 30,556
  • 2
  • 16
  • 49

[this is an attempt to combine two previously given answers, mdup's video demo and my "path-sum" story, so it might help to refer to those.]

after watching mdup's video demo i started wondering how it relates to the "path-sum" interpretation of matrix multiplication. the key seems to be that mdup's hand-drawn picture of the matrix product AB wants to be folded up to form the visible faces of an oblong box whose three dimensions correspond precisely to the points i, j, and k in a three-point path (i,j,k). this is illustrated by the pairs of pictures below, each pair showing the oblong box first in its folded-up 3-dimensional form and then in its flattened-out 2-dimensional form. in each case the box is held up to a mirror to portray the effect of transposition of matrixes.

in the first pair of pictures, the i, j, and k axises are marked, and in the folded-up 3-dimensional form you can see how transposition reverses the order of the axises from i,j,k to k,j,i. in the flattened-out 2-dimensional form you can see how it wants to be folded up because the edges marked j are all the same length (and also, because it was folded up like that when i bought the soap).

(source: ucr.edu)

(source: ucr.edu)

the second pair of pictures indicate how an entry of the product matrix is calculated. in the flattened-out 2-dimensional form, a row of the first matrix is paired with a column of the second matrix, whereas in the folded-up 3-dimensional form, that "row" and that "column" actually lie parallel to each other because of the 3d arrangement.

(source: ucr.edu)

(source: ucr.edu)

in other words, each 3-point path (i,j,k) corresponds to a location inside the box, and at that location you write down (using a 3-dimensional printer or else just writing on the air) the product of the transition-quantities for the two transition-steps in the path, A_[i,j] for the transition-step from i to j and B_[j,k] for the transition-step from j to k. this results in a 3-dimensional matrix of numbers written on the air inside the box, but since the desired matrix product AB is only a 2-dimensional matrix, the 3-dimensional matrix is squashed down to 2-dimensional by summing over the j dimension. this is the path-sum- in order for two paths to contribute to the same path-sum they're required to be in direct competition with each other, beginning at the same origin i and ending at the same destination k, so the only index that we sum over is the intermediate index j.

the 3-dimensional folded-up form and the 2-dimensional flattened-out form have each their own advantages and disadvantages. the 3-dimensional folded-up form brings out the path-sums and the 3-dimensional nature of matrix multiplication, while the 2-dimensional flattened-out form is better-adapted to writing the calculation down on 2-dimensional paper (which remains easier than writing on 3-dimensional air even still today).

anyway, i'll get off my soapbox for now ...

  • 3,929
  • 10
  • 24
  • 36

(Short form of Jair Taylor's answer)

In the expresson $v^tA B w$, vectors $v$ and $w$ "see" $A$ and $B$ from different ends, hence in different order.

Hagen von Eitzen
  • 1
  • 29
  • 331
  • 611

Considering the dimensions of the various matrices shows that reversing the order is necessary.

If A is $m \times p$ and B is $p \times n$,

AB is $m \times n$,

(AB)$^T$ is $n \times m$

A$^T$ is $p \times m$ and B$^T$ is $n \times p$

Thus B$^T$A$^T$ has the same dimension as(AB)$^T$


They key property of the transpose is that it is the unique matrix which satisfies $$ \langle Ax, y \rangle = \langle x, A^T y \rangle $$ for all $x,y$. Notice that \begin{align} \langle B A x, y \rangle &= \langle Ax, B^T y \rangle \\ &= \langle x, A^T B^T y \rangle \end{align} for all $x,y$. This shows that $$ (BA)^T = A^T B^T. $$

  • 48,104
  • 8
  • 84
  • 154

In electrical engineering this can be illustrated nicely as time reversal of FIR filters.

Convolution with a FIR filter can be represented by a circulant or Toeplitz matrix. How far from the diagonal the values are, the further into the future or past they select their values.

The trivial example is probably a permutation matrix which is power of cyclic group generator element representation matrix.

  • 24,082
  • 9
  • 33
  • 83

I'm going to make @JohnCFrain's answer a little more intuitive.

Lets say that we have matrix $A$ which is


And we have matrix $B$ which is


Remember that columns of A has to equal rows of B

Then we take $A^T$, it is


Because the rows and columns switch.

And we have $B^T$, which is


But, we have to switch because the columns of $A^T$ are not equal to the rows of $B^T$ --> $m \neq n$, so we can't multiply (which is why we switched).

Hope this helped someone who happened to scroll down (4 years and 2 months after original post)! :D

  • 95
  • 1
  • 9