I am looking for an intuitive explanation as to why/how row rank of a matrix = column rank. I've read the proof on Wikipedia and I understand the proof, but I don't "get it". Can someone help me out with this ?

I find it hard to wrap my head around the idea of how the column space and the row space is related at a fundamental level.

  • 117
  • 5
  • 1,821
  • 3
  • 12
  • 6
  • 7
    Maybe Jordan's canonical form might be of some help. According to that theory, up to similitude every matrix $M$ can be written as $$M=D+N, $$where $D$ is diagonal and $N$ is nilpotent. Transposing, we get $$M^T=D+N^T, $$ so the matrix and its transpose have the same diagonal part. In particular $M$ and $M^T$ share any information that is carried by the diagonal part, and the (column) rank is one of them. This looks like excessively complicated but I cannot think of any simpler explanation. – Giuseppe Negro Mar 17 '13 at 16:55

21 Answers21


You can apply elementary row operations and elementary column operations to bring a matrix $A$ to a matrix that is in both row reduced echelon form and column reduced echelon form. In other words, there exist invertible matrices $P$ and $Q$ (which are products of elementary matrices) such that $$PAQ=E:=\begin{pmatrix}I_k\\&0_{(n-k)\times(n-k)}\end{pmatrix}.$$ As $P$ and $Q$ are invertible, the maximum number of linearly independent rows in $A$ is equal to the maximum number of linearly independent rows in $E$. That is, the row rank of $A$ is equal to the row rank of $E$. Similarly for the column ranks. Now it is evident that the row rank and column rank of $E$ are identical (to $k$). Hence the same holds for $A$.

darij grinberg
  • 16,217
  • 4
  • 41
  • 85
  • 122,076
  • 7
  • 103
  • 187
  • 41
    +1 Nice explanation. This essential point of this argument is that elementary row operations, which by construction don't alter the row rank, also no not alter the column rank (and similarly for column operations). That this is so is because doing an elementary row operation just amounts to expressing all columns in a different basis. – Marc van Leeuwen Mar 18 '13 at 05:40
  • What is n? Also, isn't it true that you can have things other than zeros and ones in the reduced echelon form? – user2820379 Mar 13 '15 at 00:15
  • 1
    @user2820379 $n$ is the size of $A$. Also, as emphasized in the answer, $A$ is reduced to a matrix that is in **both** row echelon form and column echelon form. Therefore, everything off-diagonal would be zeroed out (by a row operation or a column operation) and what remain on the diagonal are ones and zeros. – user1551 Mar 13 '15 at 22:40
  • 1
    It is obvious that the *right multiplication* to $A$ by invertible matrix doesn't change its row rank; but I didn't find so obvious that *left multiplication* to $A$ by invertible matrix doesn't change its row rank. So I couldn't get the statement after equation in the answer. – Beginner Jul 27 '19 at 06:51
  • 1
    @Beginner See Marc van Leeuwan's comment above. That was what I thought when I wrote the answer. Alternatively, every row vector of the form $x^TA$ can be rewritten as a row vector in the form of $y^TPA$ (with $x^T=y^TP$), and vice versa. Hence the row space of $A$ have the same dimension as the row space of $PA$, i.e. $A$ has the same row rank as $PA$. And as you said, $PA$ has the same row rank as $PAQ$, because it's a right multiplication by an invertible matrix. – user1551 Jul 27 '19 at 16:15
  • @user1551 What if the size of $A$ is $m\times n$? Change $0_{(n-k)\times (n–k)}$ to $0_{(m-k)\times (n–k)}$ ? – ecook Mar 25 '20 at 05:13
  • @ecook Yes. Alternatively, you may turn the matrix into a square one by padding it with zeroes first. – user1551 Mar 25 '20 at 05:56
  • Thank you for your answer – ecook Mar 25 '20 at 14:49
  • Thank you for your answer – ecook Mar 25 '20 at 14:49

This post is quite old, so my answer might come a bit late. If you are looking for an intuition (you want to "get it") rather than a demonstration (of which there are several), then here is my 5c.

If you think of a matrix A in the context of solving a system of simultaneous equations, then the row-rank of the matrix is the number of independent equations, and the column-rank of the matrix is the number of independent parameters that you can estimate from the equation. That I think makes it a bit easier to see why they should be equal.


  • 417
  • 4
  • 5

Let $V,W$ be two vector spaces such that $\dim(V)=n$

The column rank of a matrix $A=\alpha_{(V,W)}(\varphi)$ is equal to the dimension of the image of the application $\varphi$, so

$$\text{column rank}=\dim(\text{im}(\varphi))$$

On the other side, the row rank has to do with the dimension of the kernel of $\varphi$, namely $$\dim(\ker(\varphi))=n-\text{row rank}$$ Now, we know from rank-nullity theorem that

$$\dim(\ker(\varphi))+\dim(\text{im}(\varphi))=\dim(V)=n$$ $$\implies n-\text{row rank}+\text{column rank}=n$$ $$\implies \text{column rank}=\text{row rank}$$

  • 15,266
  • 2
  • 45
  • 104
  • 3,925
  • 1
  • 14
  • 26
  • 3
    This is maybe the most pedagogical one of the answers. You need of course a separate proof of the rank-nullity theorem, but this shows that the rank result is intimately related, in fact equivalent, to that theorem which student should know about anyway. I would maybe add that the dimension $n$ of the space of departure is the number of columns of the matrix, and that $\ker(\phi)$ is by definition the solution set to the homogeneous system $Ax=0$ associated to $A$. If found after row-reducing $A$, $\dim\ker(\phi)$ is the number of columns without pivot, which is $n$ minus the row rank. – Marc van Leeuwen Sep 27 '16 at 16:31
  • 7
    Why is $\dim\ker\phi = n - \text{row rank}$? I found it easier to relate row rank to $(\ker\phi^T)^\perp$, so I end up passing through the dimension of the codomain $m=\dim W$ with $\text{row rank}=m-(m-r)$, rather than using the dimension of the domain $n=\dim V$ to get $\text{row rank}=n-(n-r)$ as you have done. – ziggurism Feb 22 '17 at 19:22
  • @ziggurism If you see the Matrix as a linear system the dimension of its kernel is the number of "useless" equations, if you understand what I mean. – Lonidard Feb 22 '17 at 19:24
  • 1
    I guess it can go two ways: 1. relate the rowspace $\im\phi^T$ to the kernel $\ker\phi\cong(\im\phi^T)^\perp$ by orthogonal complement in the domain so that $\text{row rank}=n-\dim\ker\phi$ and then first isomorphism theorem gives $V/\ker\phi\cong \im\phi$ so that $\text{row rank}=n-(n-r)=r$, or else 2. relate the rowspace $\im\phi^T$ in the domain to orthogonal complement of the kernel of the transpose $(\ker\phi^T)^\perp$ in the codomain by the first isomorphism theorem, and thence to the complementary space $\ker\phi^T\cong(\im\phi)^\perp,$ from which we have $\text{row rank}=m-(m-r)$. – ziggurism Feb 23 '17 at 03:47
  • So you chose the first option, and I landed on the second option. Of course the two options are equivalent, for reasons which I suppose boil down to OP's question. – ziggurism Feb 23 '17 at 03:49

One way to view the rank $r$ of a $n\times m$ matrix $A$ with entries in a field $K$, is that it is the smallest number such that one can factor the linear map $f_A:K^m\to K^n$ corresponding to $A$ through an intermediate space of dimension$~r$, in other words, as a composition $K^m\to K^r\to K^n$ (taking $C$ and $B$ as the matrices corresponding to the two steps, this means that one has a decomposition $A=BC$ of $A$ as the product of a $n\times r$ and a $r\times m$ matrix). Now one can always factor $f_A$ though the image $\operatorname{Im}f_A\subseteq K^n$, as $K^m\to\operatorname{Im}f_A\hookrightarrow K^n$, and on the other hand this image can never have a dimension larger than a space through which $f_A$ factors; therefore the rank is equal to $\dim\operatorname{Im}f_A$. But that dimension is equal to the maximal number of independent columns of $A$, its column rank.

On the other hand one can view the rows of $A$ as linear functions on $K^m$ that describe the coordinates of $f_A(x)$ as a function of $x$, and the row rank $s$ is the maximum number of independent such functions; once such an independent set of $s$ independent rows is chosen, the remaining coordinates of $f_A(x)$ can each be described by a fixed linear combination of the chosen coordinates (because their rows are such linear combinations of the chosen rows). But this means that one can factor $f_A$ through $K^s$, with the map $K^s\to K^n$ reconstructing the dependent coordinates. The chosen coordinates are independent, so there is no nontrivial relation between them, and the map $K^m\to K^s$ is therefore surjective. This means that $f_A$ cannot factor through a space of smaller dimension than $s$, so the row rank $s$ is also equal to the rank of $A$.

Instead of that separate argument involving the row rank, you can also interpret the row rank of $A$ as the column rank of the transpose matrix $A^t$. Now one can factor $A=BC$ if and only if one can factor $A^t=C^tB^t$; then the minimal $r$ such that one write $A=BC$ with $B\in M_{n,r}$ and $C\in M_{r,m}$ (the column rank of $A$) obviously equals the minimal $r$ such that one write $A^t=C^tB^t$ with $C^t\in M_{m,r}$ and $B^t\in M_{r,n}$ (the column rank of $A^t$, and row rank of $A$).

darij grinberg
  • 16,217
  • 4
  • 41
  • 85
Marc van Leeuwen
  • 107,679
  • 7
  • 148
  • 306

Define the rank of a matrix $A$ as the largest size of any square submatrix (minor) with non-null determinant. Then if you see the columns of $A$ as vectors, the rank of $A$ can be thought of as the maximal number of linearly independent such vectors. Finally, note that $det(M)=det(M^T)$, and if $M$ is a minor of $A$, then $M^T$ is a minor of $A^T$.

  • 9,515
  • 2
  • 24
  • 48
  • 1
    Related: [Proof that determinant rank equals row/column rank](http://math.stackexchange.com/questions/187497/proof-that-determinant-rank-equals-row-column-rank) – Martin Sleziak Jun 05 '15 at 09:21

I think Strang's "four subspaces" picture is enlightening here. Assume $A \in \mathbb R^{m \times n}$. It's easy to prove that the null space of $A$ and the image of $A^T$ are orthogonal complements. Also, $A$ (as a mapping) is one to one when restricted to the range of $A^T$. So the range of $A^T$ is actually isomorphic to the range of $A$, and $A$ itself provides the isomorphism!

Strang presents the four subspace picture in the context of inner product spaces (in fact just $\mathbb R^n$). But the picture works for arbitrary finite dimensional vector spaces if we use annihilators instead of orthogonal complements. This gives an easy, conceptual proof that $A$ and $A^T$ have the same rank. See chapter two of Lax's linear algebra book for the (easy) details.

  • 48,104
  • 8
  • 84
  • 154

Shortest and instructive proof I have seen till now:

First, recall that if the $m \times n$ matrix $A = BC$ is a product of the $m \times r$ matrix $B$ and the $r \times n$ matrix $C$, then it follows from the definition of matrix multiplication that the $i$-th row of $A$ is a linear combination of the $r$ rows of $C$ with coefficients from the $i$-th row of $B$, and the $j$-th column of $A$ is a linear combination of the $r$ columns of $B$ with coefficients from the $j$-th column of $C$. (If you have trouble understanding this or the next paragraph, you should construct several examples of small matrix products, say, a $3 \times 2$ times a $2 \times 3$ matrix, etc., with small integer as well as symbolic entries.)

On the other hand, if any collection of $r$ row vectors $c_1,c_2,\dots,c_r$ spans the row space of $A$, an $r \times n$ matrix $C$ can be formed by taking these vectors as its rows. Then the $i$-th row of $A$ is a linear combination of the rows of $C$, say $b_{i1}c_1 + b_{i2}c_2 +\dots + b_{ir}c_r$. This means $A = BC$, where $B = (b_{ij})$ is the $m \times r$ matrix whose $i$-th row, $b_{i}$), is formed from the coefficients giving the $i$-th row of A as a linear combination of the $r$ rows of $C$.

Similarly, if any $r$ column vectors span the column space of $A$, and $B$ is the $m \times r$ matrix formed by these columns, then the $r \times n$ matrix $C$ formed from the appropriate coefficients satisfies $A = BC$.

Now the four sentence proof.

THEOREM: If $A$ is an $m \times n$ matrix, then the row rank of $A$ is equal to the column rank of $A$.
Proof: If $A = 0$, then the row and column rank of $A$ are both $0$; otherwise, let $r$ be the smallest positive integer such that there is an $m \times r$ matrix $B$ and an $r \times n$ matrix $C$ satisfying $A = BC$. Thus the $r$ rows of $C$ form a minimal spanning set of the row space of $A$ and the $r$ columns of $B$ form a minimal spanning set of the column space of $A$. Hence, row and column ranks are both $r$.

NOTE: This proof has been taken from a MAA Journal.

beta_me me_beta
  • 376
  • 2
  • 10
  • 1
    This is just brilliant: it use only span to prove the existence of $B$ or $C$ when the other is given, and since $B,C$ both share the smallest $r$ for both are the same. – linear_combinatori_probabi Oct 09 '20 at 09:18
  • Other answers cannot be remembered after a year, thus useless, lol. – linear_combinatori_probabi Oct 09 '20 at 10:12
  • 1
    I think the way this proof (by William Wardlaw, taken [from here](https://www.maa.org/sites/default/files/3004418139737.pdf.bannered.pdf)) is presented is slightly lacking. It concludes by saying that the rows of $C$ are a basis of the row space of $A$, but in general if $A=BC$ the rows of $C$ need not even belong to the row space of $A$ (e.g. $B=A, C=I_n$). It's only minimality of $r$ that forces that to happen - but there's no justification of how. – Matthew Towers Mar 20 '22 at 21:12
  • (ctd) otoh there is a proof here: the paragraphs preceding the theorem establish that the row rank of $A$ equals the minimal $r$ such that there is a factorisation $A=BC$ where $C$ is $m\times r$, and that the column rank also equals this. – Matthew Towers Mar 20 '22 at 21:12

An alternative 'symmetric' definition for the rank might help here.

Define the rank of $A \in F^{m\times n} $ as the minimal $k$ for which $A$ can be represented as $$ A = c_1 r_1 + \cdots + c_k r_k$$ where $c_i \in F^{m\times 1}$ and $r_i \in F^{1 \times n}$.

Then it follows that each row of $A$ is a linear combination of $r_1, \cdots, r_k$ since the $i$th row of $A$ will be $(c_1)_i r_1 + \cdots + (c_k)_i r_k$. This implies $\text{row rank} \le k$

On the other hand, if we let $r'_1, \cdots, r'_t$ be a basis of the row space, we can write each of row of $A$ as a linear combination of $r'_i$s. Then by collecting the coefficients of $r'_i$ and making it a column vector $c'_i$, we have $A = c'_1 r'_1 + \cdots + c'_t r'_t$. Thus $\text{row rank} = t \ge k$ by the minimality of $k$.

Summing these up, we have $k = \text{row rank}$. And with the same arguments, once can easily prove $k = \text{column rank}$.

Dongryul Kim
  • 896
  • 6
  • 12

I was brought here by a comment on this question, but while many of the answers are very good, none of them quite address how I think about this question.

That said, this answer is most similar to littleO's answer, but it's a different perspective. As a geometer, I like to think about the equations that vanish on a space.

Thus, if I have a linear map $T : V\to W$, we note that it induces a map $T^*:W^*\to V^*$, and the kernel of $T^*$ is precisely the set of linear functionals on $W$ that vanish on the image of $T$. The dimension of a subspace generally is the dimension of the ambient space minus the number of equations needed to cut out the subspace. Thus $$\newcommand\im{\operatorname{im}}\dim\im T = \dim W-\dim\ker T^*.$$ By rank nullity, we can relate the dimension of the kernel of $T^*$ to the rank of $T^*$, which is the row rank of $T$ to get, $$\dim\im T^*=\dim W-\dim \ker T^*,$$ so we have $$\dim\im T=\dim\im T^*,$$ or row rank equals column rank.

  • 160
  • 1
  • 10
  • 27,109
  • 3
  • 34
  • 65

This can also be understood in terms of the singular value decomposition. Although proving the SVD takes a bit of work, once proved it provides a more thorough intuition about what is going on.

Geometrically, the SVD says that a matrix maps the unit (hyper-)sphere in the domain to a (hyper-)ellipsoid in the range, possibly squashing some axes of the ellipsoid to be flat. Moreover, the axes of the ellipsoid in the range correspond to an orthonormal set of vectors on the sphere in the domain.

The non-flat axes of the ellipsoid span the column space, and their corresponding axes on the sphere span the row space. Since these axes are orthonormal and in one-to-one correspondence, the fact that their dimension is the same becomes trivial.

Nick Alger
  • 16,798
  • 11
  • 59
  • 85
  • Also see [this excellent answer](https://math.stackexchange.com/a/636198) which states the same in a bit more detail – akraf Nov 14 '17 at 16:54

Let $V$ be a vector space of dimension n, W of dimension $m$. If $M$ is m by n matrix, it gives a linear map from $V$ to $W$. The image of this map is the column space. The dual map $W^* \rightarrow V^*$ is given by the transpose $M^T$. The image of this map is therefore the row space. Further, the Hom functor is exact in this context, so the exact sequence: $$0 \rightarrow ker(M) \rightarrow V \overset M \rightarrow M(V) \rightarrow 0$$ gives an exact sequence:

$$ 0 \rightarrow M(V)^* \overset {M^T|_{M(V)^*}} \rightarrow V^* \rightarrow ker(M)^* \rightarrow 0$$

M(V) has the same dimension as $M(V)^*$. By exactness of Hom again, the image of the map $M^T|_{M(V)^*}$ (which is $M(V)^*$) coincides with the image of $M^T$, the row space.

Note: I think only left exactness of Hom is used here...


Given a linear system of equations:$$\begin{cases}ax+by = c_1\\ cx+dy = c_2\end{cases}$$ Is there a solution?

One way to answer this is; if the lines are not parallel, then there is a solution. From the slope-intercept form of the equation of a line: $y=mx+b$, $y=-\dfrac abx + \dfrac{c_1}b$ and $y=-\dfrac cdx + \dfrac{c_2}b$.

If the lines are to intersect, the slopes can not be equal: that is $\dfrac ab \neq \dfrac cd$, which gives: $ad \neq bc\implies ad-bc \neq0$ ....the determinant can not equal zero.

Alternatively, solving for $x$ instead: $x=-\dfrac ba + \dfrac{c_1}a$ and $x =-\dfrac dc + \dfrac{c_2}c$.

$\dfrac ba \neq \dfrac dc$ and again: $da-bc \neq 0$

You can also see that the coefficients of $x$ and $y$ are "intimately" related.

I doubt the above is rigourous enough for most people, but this helps me see the relationship between rows and columns $\ldots$ between coefficients.

  • 6,054
  • 5
  • 25
  • 73


Ok. The best way is to take a concrete matrix. Take any matrix without loss of generality.

$$ A = \begin{pmatrix} 3 & 12 & 10 & 3 \\ 14 & 3 & 13 & 7 \\ 5 & 15 & 4 & 9 \end{pmatrix} $$

If you look at this matrix, you notice that in its columns you have 4 tuples of real numbers, each of which can be represented as points in $\mathcal{R}^3$. First column is a tuple $(3,14,5)$, a point in 3D-space.

So the column rank in our case cannot be more than 3 because we use 3 values to locate all points in 3D-space. Same goes for the rows, and the row rank of this matrix cannot be more than 4. It can be 4, but no more.

$$ $$

Some rows are not independent

Now imagine that without loss of generality you could represent the first row as a linear combination of other rows, e.g.

$$R_1 = aR_1 + b R_2$$

You fix these values $a,b$ mentally and see what they mean for the columns of $A$.

$$ $$

Translation into column space

They mean that in each column every top cell is a linear combination of all other values in the same column (with the same values $a,b$). That is

Value 3 is a linear combination of 14 and 5. Value 12 is a linear combination of 3 and 15.

Value 10 is a linear combination of 13 and 4. Value 3 is $7a+9b$.

$$$$ Conclusion

The main point is that we can do linear combinations of rows and columns with the same scalars $a,b$. So the column rank of our matrix would be 2. In our case the first coordinate of our points in $\mathcal{R}^3$ is redundant.

The rank of our matrix is the smallest of the column rank and the row rank.

Mikhail D
  • 986
  • 8
  • 13

I'll add a (moderately) high-brow (intuitive or not is up to you) explanation. The equality of row and column ranks is a consequence of abstract relationship $(Im A)^* \equiv (Im A^*)$, where $A:V\to W$ is any linear map, and $A^*:W^*\to V^*$ is the dual map defined by $A^*w^*(v)=w^*(Av)$. To see that $(Im A)^*= (Im A^*)$ we define a non-degenerate bilinear pairing between $Im A$ and $Im A^*$.

Namely, if $w\in Im A$ and $v^*\in Im A^*$ we pick $v\in V$ and $w^*\in W^*$ such that $w=Av$ and $v^*=A^*w^*$, and we see

$$v^* (v)=A^*w^* (v)=w^*(Av)= w^*(w)$$

and in particular all of the above expressions do not actually depend on the choices of either $v$ or $w^*$, and so we can define $ \langle v^*, w\rangle$ to be equal to all of the above.

It is obvious that this pairing is bilinear, so it remains to see that it is non-degenerate. With that in mind, suppose for some fixed $v^*=A^* w^*\in Im A^*$ we have $\langle v, w\rangle=0$ for all $w \in Im A$; then for any $u\in V$ we have $v^*(u)=A^* w^*(u) = w^*(Au)=\langle v^*, Au\rangle=0$; thus $v^*$ vanishes on all inputs and is zero. Since the roles of $A$ and $A^*$ are completely symmetric, we conclude that the pairing is indeed non-degenerate.

Some remarks:

1) The "4 subspaces" picture referred to in one of the other answers is a version of this when $V$ and $W$ are identified with $V^*$ and $W^*$ via an inner product.

2) Alternatively, one can argue as follows. There is a natural isomorphism $Im A=V/Ker A$ (this is known as the First isomorphism theorem, with map induced by $A$ providing the isomorphism; this is what gives the "rank-nulity" theorem); then $Im A^*=W^*/Ker A^*$. It is easy to see from definition that $Ker A^*=Ann Im A$ ("the 4 subspaces are orthogonal in pairs"). Then what we are proving reduces to showing $W^*/Ann Im A= (Im A)^*$, which is an instance of the general isomorphism that for any subspace $U\subseteq W$ we have $W^*/Ann U\equiv U^*$ (this actually also follows from the first isomorphism theorem applied to the map $W^* \to U^*$ induced by restriction of linear functionals from $W$ to $U$; its kernel is $Ann U$ and its image is all of $U^*$ (any functional can be extended)).

  • 12,963
  • 19
  • 39
  • Hi, I'm interested in the viewpoint of your answer, but I cannot fully understand it at my current understanding of this subject. May I ask you that where (related books or subjects) can I learn/find the relationship $(ImA)^*\equiv(ImA^*)$? – linear_combinatori_probabi Sep 20 '20 at 18:46
  • 1
    I was going to write "If I knew, I would not have written this answer" -- which is still true. However, the following come fairly close, even if they fall short of explicitly stating the relationship above: Dummit and Foote "Abstract Algebra", Section 11.3 and Corollary 21; Kostrikin and Manin, Linear Algebra and Geometry, Ch1, particularly 1.7, exercise 4. I am sure there are others. Let me know if you find another source. – Max Sep 20 '20 at 21:09

Think of a matrix $A$. Find the kernel by row-reducing the augmented matrix $[A|0]$, and observe that pivots are determined by non-pivots. So intuitively, $\dim \ker A = \text{# non-pivots}$.$^{1}$ Elementary row operations preserve row rank,$^{2}$ so row rank of $A$ = row rank of rref($A$) = # pivots. Now use the rank-nullity theorem$^{3}$ to conclude that $$ \text{row rank of $A$} = \text{# pivots} = n - \text{# non-pivots} = n - \text{dim ker $A$} = \text{dim im $A$} = \text{col rank of $A$}. $$


  1. More rigorously: recall the method of finding a basis for the kernel of a matrix, where the number of vectors in this basis is the number of non-pivots. See for example Finding null space of matrix.

  2. See for example Why do elementary matrix operations not affect the row space of a given matrix?

  3. The rank-nullity theorem says that $n = \text{dim ker $A$} + \text{dim im $A$}$, where $n$ is the number of columns. A proof of this follows by observing that if we have a basis for the kernel and we extend it to a basis of the whole space, then the images of the vectors that were added is a basis for the image.

  • 3,112
  • 5
  • 16
  • your answer will be more clearer if you explain what's a "pivot". So a reader won't have to click the links to know what you mean. – linear_combinatori_probabi Sep 20 '20 at 09:13
  • 1
    Perhaps I should have used the following terminology instead: By "pivots", I mean **leading variables**. By "non-pivots", I mean **free variables**. – twosigma Sep 20 '20 at 17:32

To see this without calculation, I guess the key is to realize (1) the row space of a matrix is the dual space of the column space of the matix, where dot product of row vector and column vector defines this dual relationship. (2). the dual basis and basis are one to one relationship on the dot product ruke. Combining (1) and (2) the ranks are the same.

  • 2,453
  • 1
  • 18
  • 29

Consider an $ m \times n $ matrix $ A = (a_{ij}) $. By row reduction, there exist elementary matrices $ E_1, \cdots ,E_p$ such that $ E_p \cdots E_1 A $ is a step matrix $ A' $.

[The first non-zero entry of a row, if exists, is called a pivot of that row. For an $ m \times n $ matrix $ A $, we'll denote it's rows by $ A_{1*}, \cdots, A_{m*} $ and columns by $ A_{*1}, \cdots, A_{*n} $. A matrix $ A $ is in step form when it's zero rows (if any) are at the bottom, and pivot indices $ j_1, \cdots, j_r $ of non-zero rows $ A_{1*}, \cdots , A_{r*} $ satisfy $ j_1 < j_2 < \cdots < j_r $ . Step matrices are also called row echelon matrices]

Say $ A' $ has $ r $ non-zero rows with pivot indices $ j_1 < \cdots < j_r $.

For any elementary matrix $ E $, $ A \mapsto EA $ preserves row space (i.e. span of matrix rows). So especially row rank (i.e. dimension of row space) of $ A $ is row rank of $ A' $, which is $ r $.

Now let's think of the column rank of $ A $. Here we'll be considering only columns, so let's take $ A_j $ to just mean column $ A_{*j} $. For any elementary matrix $ E $, $ ( A_{i_1}, \cdots, A_{i_k} ) $ is a basis of column space of $ A $ if and only if $ ( EA_{i_1}, \cdots, EA_{i_k} ) $ is a basis of column space of $ EA $, so we need only focus on the column space of $ A' $. But $ ( A'_{j_1}, \cdots, A'_{j_r} ) $ is a basis of column space of $ A' $, hence $ (A_{j_1}, \cdots, A_{j_r}) $ is a basis of column space of $ A $. Especially column rank of $ A $ is also $ r $.

To summarise, an elementary row operation on $ A $ preserves both row space and dimension of column space, so we need only look at row and column ranks of its step form $ A' $. But both of these are just the number of pivots of $ A' $.


Since the rows of $A$ are the columns of the transpose of $A$, denoted $A^{t}$, the row space of $A$ equals the column space of $A^{t}$. Define $\operatorname{rank}(A)$ to mean the column rank of $A$: $\operatorname{col rank}(A) = \dim \{Ax: x \in \mathbb{R}^n\}$.

First we show that $A^{t}Ax = 0$ if and only if $Ax = 0$. This is standard linear algebra: If $Ax = 0$, then multiplying both sides by $A^{t}$ shows $A^{t}Ax = 0$. To prove the other direction, argue as follows:

$$A^{t}Ax=0 \implies x^{t}A^{t}Ax=0 \implies (Ax)^{t}(Ax) = 0 \implies Ax = 0$$

Therefore, the columns of $A^{t}A$ satisfy the same linear relationships as the columns of $A$. This is a very crucial observation: If $Bx=0 \implies Ax=0$ then any linear relationship among the columns of $B$ are satisfied by the corresponding columns of $A$. If, in addition, $Ax=0 \implies Bx=0$ then the columns of the two matrices satisfy exactly the same linear relationships.

Define $B := A^{t}A$ (just for convenience of notation) and note that $B$ has the same number of columns as $A$. Suppose that $\{b_{j_1},\ldots,b_{j_r}\}$ is any collection of $r$ linearly independent columns in $B$. Now consider the corresponding columns in $A$: $\{a_{j_1},\ldots, a_{j_r}\}$. Then these must also be linearly independent. Why? Suppose, if possible, they were linearly dependent. Then one column from that set could be expressed as a linear combination of the others. But this would mean that the $\{b_{j_1},\ldots,b_{j_r}\}$ will satisfy the same linear relationship, which contradicts the linear independence of the set. This means that $A$ must have at least as many linearly independent columns as $B$. We can now reverse the argument exactly as above by considering a collection of linearly independent columns of $A$ and showing that $B$ must have at least as many. Therefore, $A$ and $B$ have the same number of linearly independent columns; hence, $\operatorname{col rank}(A) = \operatorname{col rank}(A^{t}A)$.

Next, observe that each column of $A^{t}A$ is a linear combination of the columns of $A^{t}$ so $\operatorname{col sp}(A^{t}A)$ is a subset of $\operatorname{col sp}(A^{t})$. Therefore, $\operatorname{col rank}(A^{t}A) \leq \operatorname{col rank}(A^{t})$ and, from what we proved above, $\operatorname{col rank}(A) \leq \operatorname{col rank}(A^{t})$.

Now simply apply the argument to $A^{t}$ to get the reverse inequality, proving $\operatorname{col rank}(A) = \operatorname{col rank}(A^{t})$. Since $\operatorname{col rank}(A^{t})$ is the row rank of A, we are done.


Given a $\DeclareMathOperator{\row}{row} \DeclareMathOperator{\col}{col} \DeclareMathOperator{\rowrank}{rowrank} \DeclareMathOperator{\colrank}{colrank}m\times n$ matrix $A$, we can consider "factorisations" of $A$ into a product of matrices $B$ and $C$, where $B$ has dimension $m \times r$, and $C$ has dimension $r \times n$. From the definition of matrix multiplication, we can derive the following two facts (I use $\col_j$ to denote the $j$-th column of $A$, and similarly for rows): $$ \begin{align} \col_j(A) &= \sum_{i=1}^{r}c_{ij}\col_i(B) \tag1\label1 \, , \\[4pt] \row_i(A) &= \sum_{j=1}^{r}b_{ij}\row_j(C) \tag2\label2 \, . \end{align} $$ The column space of $A$ is thus a subspace of the column space of $B$, and the row space of $A$ is a subspace of the row space of $C$. Hence, $$ \DeclareMathOperator{\colrank}{colrank} \DeclareMathOperator{\rowrank}{rowrank} \begin{align} &\colrank(A) \le \colrank(B) \le \text{no. of columns in $B$} = r \, , \\[3pt] &\rowrank(A) \le \rowrank(C) \le \text{no. of rows in $C$} = r \, . \end{align} $$ Let us now consider "minimal" factorisations of $A$, where we try to make $r$ as small as possible. It is in fact always possible to find a factorisation where $r=\colrank(A)$: let $\mathbf{b}_1,\dots,\mathbf{b}_r$ be a basis of the column space of $A$, and put $\col_{j}(B)=\mathbf b_j$. Then, $\eqref1$ determines a unique $C$ such that $A=BC$. Hence, if $r$ is minimal, then $r=\colrank(A)$. Similarly, we can always find a factorisation where $r=\rowrank(A)$: let $\mathbf c_1,\dots,\mathbf c_{r}$ be a basis of the row space of $C$, and put $\row_j(C)=\mathbf c_j$. Then, $\eqref2$ determines a unique $B$ such that $A=BC$. Hence, if $r$ is minimal, then $r=\rowrank(A)$, and the result follows.

  • 14,185
  • 2
  • 28
  • 65

Maybe this helps a bit with the intuition: When you transpose a matrix you don't change the dimension of the image. But when you transpose a matrix, the column rank becomes the row rank and vice versa. As the dimension of the image is the column rank those are equal.

Dominic Michaelis
  • 19,547
  • 4
  • 43
  • 75

Let us assume
$ AX = \begin{pmatrix} a_{1,1} ......& a_{1,m} \\ . & \\ . & \\ a_{n,1} ....... & a_{n,m} \end{pmatrix} \begin{pmatrix} x_{1}\\ .\\. \\x_{m} \end{pmatrix} $,

Consider the equation Ax = b, as a system of linear equations.

Geometrically in 2D, it represents a family of the lines passing through the point of intersection, So given any two line family of the other lines can be derived.

L = $L_{1}$ + $\lambda$ $L_{2}$

Similarly for n-dimensional, it can be re-written as (assuming n>m and we have r independent equation rest are a linear combination of the first r lines and therefore it can be eliminated or we can leave it).

$ Ax = \begin{pmatrix} L_{1}\\ .\\.\\ L_{1} + \lambda_{1} L_{2} +. . + \lambda_{r} L_{r+1} \\ 0 \\ 0 \end{pmatrix} $

Now let us do reduced row echelon form of A, We can see in rref(A) that pivot element will be non-zero and equal to 1 and the element above and below will be all zeros.

Let's take an example to visualize this say

$ b = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 6 & 7 & 8 & 10 \\ 2 & 3 & 4 & 6 \\ 7 & 8 & 9 & 0 \\ \end{pmatrix} $

$ rref(b) = \begin{pmatrix} 1 & 0 & -1 & 0 \\ 0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 \\ \end{pmatrix} $

rref(b) can be visualized as first removal of dependent equations and then the transformation to rref(b).

There is the presence of 1 in each row, so each column can be represented using these ones.

So rank(R) = rank(C)

  • 21
  • 7