17

I know that for a matrix $A$, if $\det(A)=0$ then the matrix does not have an inverse, and hence the associated system of equations does not have a unique solution. However, why do the determinant formulas have the form they do? Why all the complicated co-factor expansions and alternating signs ?

To sum it up: I know what determinants do, but its unclear to me why. Is there an intuitive explanation that can be attached to a co-factor expansion??..

  • 1
    [What's an intuitive way to think about the determinant?](http://math.stackexchange.com/q/668/) gives many answers about intuition for determinants – Marc van Leeuwen Jul 03 '14 at 05:05
  • @MarcvanLeeuwen Thanks! That is a massive thread, but I actually found your detailed response there very, very helpful! –  Jul 03 '14 at 12:24
  • @Eupraxis1981: I'm not sure if you were really looking for a proof of the cofactor expansion method or if you were instead looking for the *meaning* of a determinant, but if it's the latter case, I think my answer should answer your question quite directly and elegantly, so I encourage you to take a look at it. (But if it's the former please let me know.) – user541686 Jul 03 '14 at 19:26

8 Answers8

11

The determinant is actually determined by a few simple rules, as it is (up to multiplying by a constant) the only multilinear antisymmetric functional on the space of matrices.

We don't decide on a very complicated definition by choice, rather, we take two simple properties (multilinear + antisymmetric) and see what we get. And what we get is slightly complicated (but very useful).

5xum
  • 114,324
  • 6
  • 115
  • 186
  • +1. And just to note, the two (three?) defining properties can be justified (at least for real matrices) by demanding that the determinant be the signed volume of the parallelepiped spanned by the columns of the matrix. –  Jul 02 '14 at 16:08
  • @AsalBeagDubh True, but that is another consequence of the definition, not the definition itself. – 5xum Jul 02 '14 at 16:09
  • Dear 5xum, what I meant is that, at least at an intuitive level, one could start with "signed volume" as the definition, and deduce the properties you mention from that. (That is one way to answer the question "but why _these_ two defining rules?".) –  Jul 02 '14 at 16:12
  • The multilinearity and alternatedness of the determinant suggests a connection with the [exterior algebra](https://en.wikipedia.org/wiki/Exterior_algebra) of the vector space in question. And indeed there is such a connection (q.v. the linked article on Wikipedia). – kahen Jul 02 '14 at 16:15
  • @AsalBeagDubh I agree with you. I can see now how one can develop the determinant formulas from a given set of assumed properties. However, why do we care to find a multilinear and antisymmetric functional in the first place? I wonder what the *historic* motivation was for the determinant formulas? –  Jul 02 '14 at 16:26
  • @AsalBeagDubh On a related note, I think the motivation you offered is the closest I've ever come to a clear motivation for determinants qua determining singular vs nonsingular: The column vectors in an $N\times N$matrix serve as a basis for $R^N$ iff their N-dimenstional parallelepiped has non-zero signed volume. Hence, the formulas are merely algorithmic approaches to calculating this volume. The fact that you can also use the actual *values* of the determinant in useful ways seems like a knock-on benefit from their primary use as "indicator functionals" (i.e., $1_{0}(Det(A))$) –  Jul 02 '14 at 16:32
  • @AsalBeagDubh That is a very good point. It might be a good idea to put it in an answer of your own (I promise an upvote:P) – 5xum Jul 02 '14 at 20:44
  • You **must** qualify "multilinear" to either "mutlilinear by rows" or "multilinear by columns", because these are very different notions. It happens that you can use either of them to characterise the determinant, but that is not a justification for not specifying which one you are using. – Marc van Leeuwen Jul 03 '14 at 05:09
11

Two exercises that may give you the answer you need (no work, no gain):

  1. Assume you have a square $[0,1]\times [0,1]$ in the $(x,y)$-plane. Assume for some reason you need to change the variables you are using. The new variables you are using are now $w=a x + b y$ and $z=c x + d y$, where $a,b,c$ and $d$ are numbers. What is the area of the original square under the new coordinate system, the $(w,z)$-plane?
  2. A multi-linear mapping in $\mathbb{R}^2$ (bilinear in this case) is a function, $M:\mathbb{R}^2\times \mathbb{R}^2\rightarrow \mathbb{R}$ such that $M( ax+b \hat x, y)= a M(x,y)+b M(\hat x,y)$ and $M(x,a y + b\hat y)=a M(x,y)+bM(x,\hat y)$. The map is alternating if $M(x,y)=-M(y,x)$. These two properties are very useful. Exercise: Show that if $M$ has these properties then $M(x,y)=k\cdot det\pmatrix{x_1 & y_1 \\ x_2 & y_2}$.
Sergio Parreiras
  • 3,663
  • 1
  • 18
  • 40
  • I understand that these show how determinants arise, but they are called "determinants" because they determine the solvability of a linear system, hence I am looking for the motivation vis a vis that goal. –  Jul 02 '14 at 16:35
  • Do you have a reference for the claim "they are called *determinants* because they determine the solvability of a linear system" or it is just your conjecture? – Sergio Parreiras Jul 02 '14 at 16:45
  • 1
    In the context of a linear-system: $Ax=b$, when $A$ is not full-rank then we can do linear combinations of its row and columns so that we either get redundant equations (multiple solutions) or an inconsistent system (no solutions). To diagnose if this is the case, we need to have a test function that can tell us whether a row (or column) is a linear combination of other rows (or columns). A multi-linear alternating function is the easiest way (the only way?) to achieve that. – Sergio Parreiras Jul 02 '14 at 16:55
  • 1
    Conjecture - whoever named them probably did not say why - I'm thinking of the analogy with the quadratic determinant $b^2-4ac$, which determines if there are any real solutions. –  Jul 02 '14 at 16:56
  • Thanks for your expanded comment. This seems to be the key motivation behind the determinant...we want a formula that signals when a system is solvable, and hence, per Hagen von Eitzen's response below, such a function will indeed be 0 if one of its arguments is equivalent to another. Correct? –  Jul 02 '14 at 16:59
  • 1
    Correct: $M(a x,x)= a M(x,x)$ by multi-linearity and $M(x,x)=-M(x,x)$ by being alternating, so $2\,M(x,x)=0$ and also $M(a x,x)=0$. – Sergio Parreiras Jul 02 '14 at 17:02
  • +1 Excellent! Thanks for closing the "theoretical loop" on this. –  Jul 02 '14 at 17:44
  • 1
    @Eupraxis1981: $b^2-4ac$ and things like it are called [discriminants](http://en.wikipedia.org/wiki/Discriminant). –  Jul 02 '14 at 18:30
  • 2
    @StevenTaschuk: historically this has not always been the case, "The term 'determinant' was first introduced by Gauss in Disquisitiones arithmeticae (1801) while discussing quadratic forms. He used the term because **the determinant determines the properties of the quadratic form**. However the concept is not the same as that of our determinant." http://www-groups.dcs.st-and.ac.uk/~history/HistTopics/Matrices_and_determinants.html – Sergio Parreiras Jul 02 '14 at 19:07
  • 2
    @Eupraxis1981 : you main find the answer to this question http://math.stackexchange.com/questions/81521/development-of-the-idea-of-the-determinant also useful, it connects the geometric view with the algebraic one. – Sergio Parreiras Jul 02 '14 at 19:15
  • @StevenTaschuk in fairness to Steven, I simply got it mistaken in my head :-\..but let's pretend I know as much as Sergio ;-P –  Jul 03 '14 at 12:16
5

Let $V$ be an $n$-dimensional vector space. You can consider set $W_n$ (there is a technical symbolism used for this space, which I will not bother you with) of maps $f\colon V^n\to \mathbb R$ with the following conditions:

  • $f$ is multilinear, that is: linear in each argument: $$f(v_1, \ldots, v_{i-1},av_i+bu_i, v_{i+1},\ldots, v_n)=af(v_1, \ldots, v_{i-1},v_i, v_{i+1},\ldots, v_n)+bf(v_1, \ldots, v_{i-1},u_i, v_{i+1},\ldots, v_n)$$
  • $f$ is alternating, i.e. $f(v_1,\ldots v_n)=0$ whenever $v_i=v_j$ for some $i\ne j$.

The set $W_n$ is a vector space (under the obvious addition and scalar multiplication), and we can wonder what its dimension is. Intriguingly, $\dim W_n=1$. As a consequence, each $f\in W_n$ is already determined by its value at one nontrivial point. Especially, there is a unique $f$ with the property that $f(e_1,\ldots, e_n)=1$ (where the $e_i$ are the standard base vectors). Then, for any $n\times n$ matrix $A$ with columns $v_1,\ldots, v_n$, one can show that we simply have $\det A=f(v_1,\ldots, v_n)$. Hence all the idiosyncrasies of $\det$ are not so unusual at all: they come naturally from the simple requirements of being multilinear and alternating (and these are also naturally related to solubility of linear equations).

John Bentin
  • 16,449
  • 3
  • 39
  • 64
Hagen von Eitzen
  • 1
  • 29
  • 331
  • 611
  • 1
    Thanks! This is a good formal answer, but I don't see why one would care to seek such a functional in the first place. I doubt the originators of determinants were thinking "Gee, I which we had an alternating-multilinear functional." –  Jul 02 '14 at 16:36
3

So a quick consultation of wikipedia suggests that the determinant was used for linear systems long before it was understood those systems could written in terms of matrices.

Nowadays, of course, we know we can write such systems in terms of matrices, or more broadly as linear operators. It turns out that a simple extension of a linear operator under the exterior algebra helps replicate the determinant:

The exterior algebra uses a wedge product, denoted with $\wedge$, and wedge products of several vectors allow us to treat planes, volumes, and such as algebraic elements.

The natural extension of a linear operator across the wedge product is to wedge every vector in that product: so given a wedge product $a \wedge b \wedge c \wedge \ldots$, the natural extension of a linear operator $\underline T$ is

$$\underline T(a \wedge b \wedge c \wedge \ldots) \equiv \underline T(a) \wedge \underline T(b) \wedge \underline T(c) \wedge \ldots$$

That is, have the operator act on each vector individually, then compute the wedge.

When you do this with the highest-graded wedge product, a wedge product of $n$ vectors in an $n$-dimensional space, the action of the linear operator reduces to a scalar multiplication. Let the $n$-vector be denoted $i$, and we get

$$\underline T(i) = (\det \underline T)i$$

This expresses how the "volume" of the space changes orientation and is dilated or shrunk under the transformation. The antisymmetry of the wedges captures exactly the same alternating plus and minus signs typically used to compute the determinant.

Edit: Why does the change in the volume matter? Well, you can grasp how this relates to invertibility: if the linear operator maps volume to volume, then you can see how there may be a bijection between vectors, but if the operator maps volume to zero volume, then the image is at most something smaller (in dimensionality) than volume, and as with any projection, that means multiple vectors are mapped to the same output vector---such a map cannot be invertible.

Muphrid
  • 18,790
  • 1
  • 23
  • 56
  • One thing that one has to verify regarding $T(\bigwedge_{i=1}^n v_i) := \bigwedge_{i=1}^n T(v_i)$ is that there could be different vectors $w_1,\dotsc,w_n$ such that $\bigwedge_{i=1}^n v_i = \bigwedge_{i=1}^n w_i$, so you have to check that you get the same result on the right hand side regardless of the choices you make. – kahen Jul 02 '14 at 23:30
  • Outermorphisms are well-defined. That is if $\bigwedge_{i=1}^n v_i = \bigwedge_{i=1}^n w_i$, then $\bigwedge_{i=1}^n T(v_i) = \bigwedge_{i=1}^n T(w_i)$. This is easy enough to show, after choosing a basis, from the linearity of $T$. –  Jul 03 '14 at 19:40
2

Because determinants are just the signed volume of the parallelotope with sides as the column vectors, the determinant formula is equivalent to (and can be derived from) the product of the norms of the vectors produced by the Gram-Schmidt process applied to the column vectors -- or at least it is when the column vectors are linearly independent. Note that this process finds the volume of a orthotope, but it is the same as the volume of the original parallelotope because of Cavalieri's Principle. The sign is then determined by the whether it is a right-handed or left-handed arrangement of those column vectors.

user161280
  • 59
  • 5
  • Thanks for your intuitive explanation. Howver, *why* do we care about this volume? –  Jul 02 '14 at 16:33
  • Why do we care about any bit of math? I care because it's interesting. Maybe you're more applications-minded? In that case, this gives a good geometric interpretation of what the determinant is. And if you're familiar with the Gram-Schmidt process, it also gives a means of finding it without having to memorize formulas about cofactor matrices. As to what we use the determinant to calculate the volumes of, check out the Jacobian determinant if you haven't encountered it already. It is VERY important in multivariable calculus. – user161280 Jul 02 '14 at 16:39
  • Honestly though, I just added this answer because I didn't see it represented here. But, whenever I think about a determinant, I think about it in the way described by Muphrid. -- I don't know if you're familiar with exterior algebra (or it's generalization: geometric algebra), but it's worth checking out if you're interested in this stuff. – user161280 Jul 02 '14 at 16:43
1

Suppose you want to tell if a square matrix is invertible or not. You may try to come up with a function f on the space of square matrices which is "as simple as possible" and has the property that f(A)=0 if and only if A is invertible. What can be simpler than a polynomial function of matrix coefficients? So, you try to find a polynomial function. How to measure complexity of a polynomial? Probably by minimizing the degree. Thus, you look for a polynomial of minimal degree which has this property. Then you discover

Theorem. The only minimal degree polynomials f with the required property are scalar multiple of the determinant.

What is left is to pick the right scalar multiple. You decide (a bit arbitrarily) to require that f(I)=1, where I is the identity matrix. Now, you conclude that f is the determinant.

Moishe Kohan
  • 83,864
  • 5
  • 97
  • 192
  • Thanks. This is in line with what I was looking for too. I know the geometric interpretation, but the underling motivation for them seems to be a convenient "marker" for solveability (and they *happen* to have a lot of other nice properties). –  Jul 03 '14 at 12:09
1

Of course there are many good answers. However, I think what I post below adds some value. I show why the determinant might be discovered merely as a consequence of systematic row reduction to seek a criterion for invertibility. The conclusion of the derivation is by no means unique, and that is where volume and orientability comes into play. I have not much to say about those as they have been addressed in other answers already. What follows is actually taken from my 2014 Linear Algebra notes

The base case $n=1$ has $A=a \in \mathbb{R}$ as we identify $\mathbb{R}^{ 1 \times 1}$ with $\mathbb{R}$. The equation $ax=b$ has solution $x=b/a$ provided $a \neq 0$. Thus, the simple criteria in the $n=1$ case is merely that $\boxed{a \neq 0}$.

The $n=2$ case has $A = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right]$. We learned that the formula for the $2 \times 2$ inverse is: $$ A^{-1} = \frac{1}{ad-bc}\left[ \begin{array}{cc} d & -b \\ -c & a \end{array} \right]. $$ The necessary and sufficient condition for invertibility here is just that $ad-bc \neq 0$. That said, it may be helpful to derive this condition from row reduction. For brevity of discussion (you could break into further cases if you want a more complete motivating discussion, our current endeavor is to explain why the determinant formula is natural) we assume $a,c \neq 0$. $$ A = \left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] \ \underrightarrow{ \ cr_1, ar_2 \ } \ \left[ \begin{array}{cc} ac & bc \\ ac & ad \end{array} \right] \ \underrightarrow{ \ r_2- r_1 \ } \ \left[ \begin{array}{cc} ac & bc \\ 0 & ad-bc \end{array} \right] $$ Observe that $\boxed{ad-bc \neq 0}$ is a necessary condition to reduce the matrix $A$ to the identity.

The $n=3$ case has $A = \left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right]$. I assume here for brevity that $a,b,c,d,e,f \neq 0$ \begin{align} \notag A = \left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right] \ \ &\underrightarrow{ \ bcr_1, \ acr_2,\ abr_3 \ } \ \left[ \begin{array}{c|c|c} abc & dbc & gbc \\ acb & ace & ach \\ abc & abf & abi \end{array} \right] \\ \notag &\underrightarrow{ \ r_2-r_1, \ r_3-r_1 \ } \ \left[ \begin{array}{c|c|c} abc & dbc & gbc \\ 0 & c(ae-db) & c(ah-gb) \\ 0 & b(af-dc) & b(ai-gc) \end{array} \right] \\ \notag &\underrightarrow{ \ r_1/(bc), \ r_2/c, r_3/b \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & ae-db & ah-gb \\ 0 & af-dc & ai-gc \end{array} \right] \\ \notag &\underrightarrow{ \ r_2/(ae-db) \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & af-dc & ai-gc \end{array} \right] \\ \notag &\underrightarrow{ \ r_3- (af-dc)r_2 \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & 0 & ai-gc -(af-dc)\frac{ah-gb}{ae-db} \end{array} \right] \\ \notag &\underrightarrow{ \ (ae-db)r_3 \ } \ \left[ \begin{array}{c|c|c} a & d & g \\ 0 & 1 & \frac{ah-gb}{ae-db} \\ 0 & 0 & (ai-gc)(ae-db) -(af-dc)(ah-gb) \end{array} \right] \\ \notag \end{align} Apparently, we need $(ai-gc)(ae-db) -(af-dc)(ah-gb) \neq 0$. Let's see if we can simplify it, \begin{align} \notag (ai-gc)(ae-db) -(af-dc)(ah-gb) &= a^2ie-aidb-gcae+gcdb-a^2fh+afgb+dcah-dcgb\\ \notag &= a[aie-idb-gce-afh+fgb+dch] \end{align} We already assumed $a \neq 0$ so it is most interesting to require: $$ \boxed{aie-idb-gce-afh+fgb+dch \neq 0} $$ The condition above would seem to yield invertibility of $A$. To be careful, the calculation above does not prove anything about matrices for which the above row operations are forbidden. Technically, you'd need to examine those cases separately to prove the boxed criteria suffices for invertiblity of $A$. That said, perhaps this section helps motivate why we define the following determinants: \begin{align} \notag \text{det}[a] &= a, \\ \notag \text{det}\left[ \begin{array}{cc} a & b \\ c & d \end{array} \right] &=ad-bc, \\ \notag \text{det}\left[ \begin{array}{ccc} a & d & g \\ b & e & h \\ c & f & i \end{array} \right] &= aie-idb-gce-afh+fgb+dch \end{align} If $x \neq 0 $ then $-x \neq 0$ thus the invertibility criteria alone does not suffice to uniquely determine the determinant. We'll see in a later section that the choice of sign has geometric significance. If a set of $n-1$ vectors $v_1,\dots v_{n-1}$ forms a hyperplane in $\mathbb{R}^n$ and we consider $\text{det}[v_1|\cdots |v_n |w]$ for some vector $w$ then the determinant is positive if $w$ is one one side of the hyperplane and it is negative if $w$ is one the other side. If $w$ is on the hyperplane then the determinant is zero. These facts serve to determine the definition of the determinant in general.

James S. Cook
  • 16,540
  • 3
  • 42
  • 100
  • Wow...I really like your direct approach. Thanks for contributing to this! –  Jul 03 '14 at 20:26
0

This is just an addendum to Hagen von Eitzen's excellent answer. Historically, determinants arose as a way of writing down the solution to a system of linear equations. In modern notation, such a system may be written $$A\boldsymbol x=\boldsymbol b,$$ where $A$ is an $m\times n$ real matrix, $\boldsymbol b$ is a known $m$-vector, and $\boldsymbol x$ is the $n$-vector to be determined. A necessary condition for the system to have a unique solution is that $m\geqslant n$. In fact, if $m>n$, then at least $m-n$ of the equations are redundant. So the interesting case is $m=n$. Even then, the system may be inconsistent (and perhaps redundant as well), in which case it has no solution; or it may be be consistent but redundant, when the solution is not determined. The key case is when the system is square ($m=n$), consistent, and irredundant. Then the solution is given by $$\boldsymbol x=A^{-1}\boldsymbol b,$$ where $A^{-1}$ is the usual matrix inverse. Determinants arise as the entries of $A^{-1}$: each entry is a ratio, all entries having the common denominator $\det A$, which must therefore be nonzero. The numerators are the cofactors of $A$, which are also determinants. The properties of multilinearity and alternation of $\det$ correspond to familiar properties of linear equations; for example, replacing one equation by itself plus a linear combination of other equations doesn't change the solution (or $\det A$).

John Bentin
  • 16,449
  • 3
  • 39
  • 64