113

How can I prove $\operatorname{rank}A^TA=\operatorname{rank}A$ for any $A\in M_{m \times n}$?

This is an exercise in my textbook associated with orthogonal projections and Gram-Schmidt process, but I am unsure how they are relevant.

Rodrigo de Azevedo
  • 18,977
  • 5
  • 36
  • 95
jaynp
  • 1,861
  • 4
  • 17
  • 31
  • 46
    Obligatory remark: This holds when $A\in\mathbb R^{m\times n}$, but not (for instance) when $A\in\mathbb C^{m\times n}$. – darij grinberg Apr 03 '13 at 04:31
  • 6
    See [this counter-example](http://math.stackexchange.com/a/682249/2420) for the case where $A\in\mathbb C^{m\times n}$. – M.S. Dousti Jan 13 '16 at 22:43
  • 3
    Or $A^T=(1~~\mathbf i)$. – Marc van Leeuwen Apr 27 '16 at 14:52
  • 6
    The conclusion also fails for fields of nonzero characteristic (finite fields in particular). See the comment by @Member below on the Accepted Answer, identifying the crucial step that $(Ax)^T(Ax) = 0$ implies $Ax=0$. – hardmath Jun 08 '17 at 17:44

3 Answers3

166

Let $\mathbf{x} \in N(A)$ where $N(A)$ is the null space of $A$.

So, $$\begin{align} A\mathbf{x} &=\mathbf{0} \\\implies A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x} &\in N(A^TA) \end{align}$$ Hence $N(A) \subseteq N(A^TA)$.

Again let $\mathbf{x} \in N(A^TA)$

So, $$\begin{align} A^TA\mathbf{x} &=\mathbf{0} \\\implies \mathbf{x}^TA^TA\mathbf{x} &=\mathbf{0} \\\implies (A\mathbf{x})^T(A\mathbf{x})&=\mathbf{0} \\\implies A\mathbf{x}&=\mathbf{0}\\\implies \mathbf{x} &\in N(A) \end{align}$$ Hence $N(A^TA) \subseteq N(A)$.

Therefore $$\begin{align} N(A^TA) &= N(A)\\ \implies \dim(N(A^TA)) &= \dim(N(A))\\ \implies \text{rank}(A^TA) &= \text{rank}(A)\end{align}$$

Empiricist
  • 7,628
  • 1
  • 20
  • 40
A.D
  • 5,914
  • 1
  • 16
  • 42
18

Let $r$ be the rank of $A \in \mathbb{R}^{m \times n}$. We then have the SVD of $A$ as $$A_{m \times n} = U_{m \times r} \Sigma_{r \times r} V^T_{r \times n}$$ This gives $A^TA$ as $$A^TA = V_{n \times r} \Sigma_{r \times r}^2 V^T_{r \times n}$$ which is nothing but the SVD of $A^TA$. From this it is clear that $A^TA$ also has rank $r$. In fact the singular values of $A^TA$ are nothing but the square of the singular values of $A$.

  • Note that from Strang's textbook, it actually use the fact that, there're $r$ non-zero eigenvalues of $A^TA$ i.e. $rank(A^TA)=rank(A)$, to decide the size of $\Sigma_{r \times r}$ and prove the SVD. To avoid circular argument here it would require a different SVD proof. – Weishi Z Jan 06 '21 at 12:03
4

Since elementary operations do not change the rank of a matrix we have $\text{rank}(A^TA) = \text{rank}(E^TA^TAE)$, where $E$ is a multiplication of several elementary operations which make $AE = [A_1, A_2]$, where $A_1$ is a column full rank matrix with $\text{rank}(A_1) = \text{rank}(A)$.

Thus we can find a matrix $P$ such that $A_1P= A_2$ and $AE = [A_1, A_1P] = A_1[I, P]$.

Thus $\text{rank}(E^TA^TAE) = \text{rank}(A_1[I, P])^T(A_1[I, P])$. In this equation, the matrices are all of full rank and the rank equals $\text{rank}(A)$, so on a real space $\text{rank}(A^TA) = \text{rank}(A)$, completing the proof.

user26857
  • 1
  • 13
  • 62
  • 125
Xiangru Lian
  • 317
  • 1
  • 3
  • 11
  • 2
    I cannot decipher what is said here, but it must be wrong since it never uses that the matrices are over $\Bbb R$ (or more generally an ordered field) rather than for instance over $\Bbb C$ where the result is not true. – Marc van Leeuwen Apr 27 '16 at 14:38
  • 1
    The last theorem actually implicitly uses they are over real space. Thank you for pointing that out. I have added that prerequisite into my answer. – Xiangru Lian Apr 30 '16 at 04:47
  • 3
    An alternative simple way to see it: Matrix $A^T$ may be reduced to its reduced row-echelon form, $R$, by $PA^T=R$, where $P$ is the product of a sequence of elementary matrices. So, $A^T=P^{-1}R$ and hence $$\mathrm{rank}(A^TA)=\mathrm{rank}(P^{-1}RR^T(P^{-1})^T)=\mathrm{rank}(RR^T).$$ The result then follows easily from this, since clearly $\mathrm{rank}(RR^T)=\mathrm{rank}(R)=\mathrm{rank}(A).$ – syeh_106 Jan 14 '17 at 02:07
  • @syeh_106 Why is it that $\operatorname{rank}(RR^T)=\operatorname{rank}(R)$? I'm sorry if this is too basic. – JPYamamoto Aug 17 '20 at 19:08
  • 2
    @JPYamamoto If $R^T$ is full column rank, this is clearly true: $RR^Tx=0 \Rightarrow x^TRR^Tx=\Vert R^Tx\Vert^2=0 \Rightarrow R^Tx = 0 \Rightarrow x = 0$, i.e. $RR^T$ is still full column rank. Otherwise, $R^T= [R_1^T\, 0]$, where $R_1^T$ is full column rank, and it's easily verified that $\mathrm{rank}(RR^T)=\mathrm{rank}(R_1R_1^T)=\mathrm{rank}(R_1)=\mathrm{rank}(R).$ – syeh_106 Aug 19 '20 at 04:33