216

I happened to stumble upon the following matrix: $$ A = \begin{bmatrix} a & 1 \\ 0 & a \end{bmatrix} $$

And after trying a bunch of different examples, I noticed the following remarkable pattern. If $P$ is a polynomial, then: $$ P(A)=\begin{bmatrix} P(a) & P'(a) \\ 0 & P(a) \end{bmatrix}$$

Where $P'(a)$ is the derivative evaluated at $a$.

Futhermore, I tried extending this to other matrix functions, for example the matrix exponential, and wolfram alpha tells me: $$ \exp(A)=\begin{bmatrix} e^a & e^a \\ 0 & e^a \end{bmatrix}$$ and this does in fact follow the pattern since the derivative of $e^x$ is itself!

Furthermore, I decided to look at the function $P(x)=\frac{1}{x}$. If we interpret the reciprocal of a matrix to be its inverse, then we get: $$ P(A)=\begin{bmatrix} \frac{1}{a} & -\frac{1}{a^2} \\ 0 & \frac{1}{a} \end{bmatrix}$$ And since $f'(a)=-\frac{1}{a^2}$, the pattern still holds!

After trying a couple more examples, it seems that this pattern holds whenever $P$ is any rational function.

I have two questions:

  1. Why is this happening?

  2. Are there any other known matrix functions (which can also be applied to real numbers) for which this property holds?

Rodrigo de Azevedo
  • 18,977
  • 5
  • 36
  • 95
ASKASK
  • 8,711
  • 4
  • 24
  • 44
  • 54
    This is related to the 'dual numbers': interpret this matrix as $A = a + \epsilon$, where $\epsilon^2=0$ and then $f(A) = f(a+\epsilon) = f(a) + \epsilon f'(a) + \tfrac12 \epsilon^2 f''(a) + \dots = f(a) + \epsilon f'(a)$. – Myself Nov 22 '15 at 22:44
  • 17
    *Very* closely related, in fact, since the dual numbers $\mathbb{R}[\epsilon]$ are isomorphic (as a topological ring) to the matrix ring $\mathbb{R}[A]$. –  Nov 23 '15 at 07:36

6 Answers6

152

If $$ A = \begin{bmatrix} a & 1 \\ 0 & a \end{bmatrix} $$ then by induction you can prove that $$ A^n = \begin{bmatrix} a^n & n a^{n-1} \\ 0 & a^n \end{bmatrix} \tag 1 $$ for $n \ge 1 $. If $f$ can be developed into a power series $$ f(z) = \sum_{n=0}^\infty c_n z^n $$ then $$ f'(z) = \sum_{n=1}^\infty n c_n z^{n-1} $$ and it follows that $$ f(A) = \sum_{n=0}^\infty c_n A^n = I + \sum_{n=1}^\infty c_n \begin{bmatrix} a^n & n a^{n-1} \\ 0 & a^n \end{bmatrix} = \begin{bmatrix} f(a) & f'(a) \\ 0 & f(a) \end{bmatrix} \tag 2 $$ From $(1)$ and $$ A^{-1} = \begin{bmatrix} a^{-1} & -a^{-2} \\ 0 & a^{-1} \end{bmatrix} $$ one gets $$ A^{-n} = \begin{bmatrix} a^{-1} & -a^{-2} \\ 0 & a^{-1} \end{bmatrix}^n = (-a^{-2})^{n} \begin{bmatrix} -a & 1 \\ 0 & -a \end{bmatrix}^n \\ = (-1)^n a^{-2n} \begin{bmatrix} (-a)^n & n (-a)^{n-1} \\ 0 & (-a)^n \end{bmatrix} = \begin{bmatrix} a^{-n} & -n a^{-n-1} \\ 0 & a^{-n} \end{bmatrix} $$ which means that $(1)$ holds for negative exponents as well. As a consequence, $(2)$ can be generalized to functions admitting a Laurent series representation: $$ f(z) = \sum_{n=-\infty}^\infty c_n z^n $$

Martin R
  • 87,132
  • 7
  • 70
  • 142
  • 2
    Thank you for this wonderful solution! This makes me very curious about generalizing this. Does this mean that if we have an operator $F$ on the space of function which admit Laurent series representations, which satisfies $F(af(x)+bg(x))=aF(f(x))+bF(g(x))$ for scalars $a$ and $b$ and also that $F(x^n)=nx^{n-1}$ for all integers $n$, must it be true that $F$ is the differentiation operator? It seems to me that the only properties that you use are linearity and its effect on $x^n$, so I am curious if this generalizes? – ASKASK Nov 23 '15 at 03:08
  • @ASKASK: I am afraid that goes beyond my knowledge of this topic. Perhaps that follows from the solutions given in the other answers here? Otherwise you might consider to ask a new question about this generalization. – Martin R Nov 23 '15 at 06:20
  • 6
    @ASKASK: Up to possibly some technical constraints on the function space you're working on (and I suspect "admitting Laurent series" is sufficient, but there may be other technicalities), yes, a linear operator that acts as differentiation on monomials should necessarily be the differentiation operator. – R.. GitHub STOP HELPING ICE Nov 23 '15 at 16:16
  • In the last lines it should be $(-a^{-2})^n$, not $(-a^{2})^n$, because $a^{-1}=(-a^{-2})(-a)$. Also in the next line it should be $(-1)^n a^{-2n}$, not $(-1)^n a^{2n}$. Everything else stays the same. – user236182 Oct 05 '17 at 17:12
  • @user236182: You are right, thank you! – Martin R Oct 05 '17 at 18:40
  • @MartinR Good evening Mr. I just wanted to say something about the today's idiotic post "solve $sin(1/x) > 0$". It was not me who wrote it, it was a friend of mine. I told him to use Stack Exchange about some problems he had, because I did not have time for him (my friend is a first year university freshman, I sometimes invite him in my office). I showed him MSE from another computer, where I was logged in, and he took the advantage and he asked the question. I noticed only now all the misdeed, whence my apologies for everything. – Laplacian Apr 16 '18 at 17:29
  • @VonNeumann: That explains it! I can imagine how annoyed you were :) – Martin R Apr 17 '18 at 05:24
  • @MartinR Indeed!! When I saw your comment (with the link to THAT answer) I basically frozen. I felt so ashamed! At least the question got deleted – Laplacian Apr 17 '18 at 10:48
69

It's a general statement if $J_{k}$ is a Jordan block and $f$ a function matrix then \begin{equation} f(J)=\left(\begin{array}{ccccc} f(\lambda_{0}) & \frac{f'(\lambda_{0})}{1!} & \frac{f''(\lambda_{0})}{2!} & \ldots & \frac{f^{(n-1)}(\lambda_{0})}{(n-1)!}\\ 0 & f(\lambda_{0}) & \frac{f'(\lambda_{0})}{1!} & & \vdots\\ 0 & 0 & f(\lambda_{0}) & \ddots & \frac{f''(\lambda_{0})}{2!}\\ \vdots & \vdots & \vdots & \ddots & \frac{f'(\lambda_{0})}{1!}\\ 0 & 0 & 0 & \ldots & f(\lambda_{0}) \end{array}\right) \end{equation} where \begin{equation} J=\left(\begin{array}{ccccc} \lambda_{0} & 1 & 0 & 0\\ 0 & \lambda_{0} & 1& 0\\ 0 & 0 & \ddots & 1\\ 0 & 0 & 0 & \lambda_{0} \end{array}\right) \end{equation} This statement can be demonstrated in various ways (none of them short), but it's a quite known formula. I think you can find it in various books, like in Horn and Johnson's Matrix Analysis.

Okoyos
  • 286
  • 2
  • 10
Dac0
  • 8,414
  • 2
  • 18
  • 50
  • Does this have a name? What's the intuition behind it? – nbubis Nov 22 '15 at 22:52
  • @nbubis *is a Jordan block and f a function matrix* ? – Red Banana Nov 22 '15 at 23:42
  • In fact, this is sometimes taken to be the *definition* of a matrix function when evaluated at a Jordan block; see [this](http://books.google.com/books?id=S6gpNn1JmbgC&pg=PA3). See [this paper](http://dx.doi.org/10.2307/2306996) as well. – J. M. ain't a mathematician Nov 23 '15 at 05:23
  • 4
    Could you include the definition or a link to the definition of a Jordan block in this answer? – Mario Carneiro Nov 23 '15 at 07:46
  • A previous edit changed $J_k$ to $J$ in all but one occurrence. However, simply changing $J_k$ to $J$ in the first sentence doesn't seem to me to read well. Would it be correct to move the given $J$ up to the first line as "*... if $J$ is the Jordan block ...*"? – Peter Taylor Apr 05 '18 at 10:10
26

$$ A =a \mathbb{I}+M $$ where $$M = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}$$

most relevantly, $M$ is an upper triangular matrix satisfying $ M^n=0;n>1$

$$ A^n=\left(a \mathbb{I}+M \right)^n $$ Since all these matrices ( $ \mathbb{I}$ and $M$ ) commute we can write out the binomial formula ... $$A^n = \sum_{i=0}^n\binom ni a^i \mathbb{I}^iM^{n-i}$$ The only non-zero terms will be $i=n$ and $i=n-1$ $$A^n= a^n \mathbb{I}+na^{n-1}M $$

WW1
  • 9,693
  • 1
  • 14
  • 15
23

Here's one for the algebra lovers. It can probably be improved substantially, since I'm not an algebra lover myself. For simplicity, I'll use rational functions with rational coefficients; sophisticated readers may substitute their favorite coefficient fields.

A rational function in the variable $x$ is an element of the field $\mathbb{Q}(x)$. You can think of $\mathbb{Q}(x)$ as a vector space over itself. Multiplication by $x$ is a $\mathbb{Q}(x)$-linear map from $\mathbb{Q}(x)$ to itself, which can be written as the $1 \times 1$ matrix $$X_0 = \left[ \begin{array}{c} x \end{array} \right].$$ The derivative, sadly, is not a $\mathbb{Q}(x)$-linear map. However, we can represent it with a $\mathbb{Q}(x)$-linear map by resorting to trickery. If we happen to know the derivative $f'$ of a function $f$, we can bundle the two together into an element of $\mathbb{Q}(x)^2$: $$\left[ \begin{array}{c} f' \\ f \end{array} \right].$$ Multiplication by $x$ sends this "derivative pair" to $$\left[ \begin{array}{c} (xf)' \\ xf \end{array} \right] = \left[ \begin{array}{c} xf' + f \\ xf \end{array} \right] = \left[ \begin{array}{cc} x & 1 \\ 0 & x \end{array} \right] \left[ \begin{array}{c} f' \\ f \end{array} \right].$$ In other words, multiplication by $x$ acts on derivative pairs in $\mathbb{Q}(x)^2$ by the matrix $$X_1 = \left[ \begin{array}{cc} x & 1 \\ 0 & x \end{array} \right].$$


The set of derivative pairs isn't very nice. In particular, it's not a subspace of $\mathbb{Q}(x)^2$. We can say one thing about it, though: we know it contains $$\left[ \begin{array}{c} 1' \\ 1 \end{array} \right] = \left[ \begin{array}{c} 0 \\ 1 \end{array} \right].$$ Using the matrix $X_1$, we can develop this fact into a complete understanding of derivative pairs.

To see how, recall that the multiplication-by-$x$ operator on $\mathbb{Q}(x)$ is invertible. In fact, plugging the multiplication-by-$x$ operator into any rational function gives a well-defined $\mathbb{Q}(x)$-linear operator on $\mathbb{Q}(x)$. Every rational function can be built up from $1 \in \mathbb{Q}(x)$ by applying a rational function of the multiplication-by-$x$ operator. In matrices, $$\left[ \begin{array}{c} f \end{array} \right] = f(X_0) \left[ \begin{array}{c} 1 \end{array} \right].$$ Similarly, any derivative pair in $\mathbb{Q}(x)^2$ can be built up from $$\left[ \begin{array}{c} 0 \\ 1 \end{array} \right] \in \mathbb{Q}(x)^2$$ by applying a rational function of $X_1$: $$\left[ \begin{array}{c} f' \\ f \end{array} \right] = f(X_1) \left[ \begin{array}{c} 0 \\ 1 \end{array} \right].$$ That explains the mystifying behavior you noticed.


This idea can be extended to higher derivatives. For example, multiplication by $x$ acts on "derivative quadruples" $$\left[ \begin{array}{c} f''' \\ f'' \\ f' \\ f \end{array} \right] \in \mathbb{Q}(x)^4$$ by the matrix $$X_4 = \left[ \begin{array}{cc} x & 3 & 0 & 0 \\ 0 & x & 2 & 0 \\ 0 & 0 & x & 1 \\ 0 & 0 & 0 & x \end{array} \right],$$ yielding the formula $$\left[ \begin{array}{c} f''' \\ f'' \\ f' \\ f \end{array} \right] = f(X_4) \left[ \begin{array}{c} 0 \\ 0 \\ 0 \\ 1 \end{array} \right].$$ I'm certain this whole business is related to Myself's comment about dual numbers, but I haven't worked out how.

Vectornaut
  • 4,537
  • 21
  • 29
22

Just to expand on the comment of @Myself above...

Adjoining algebraics as matrix maths

Sometimes in mathematics or computation you can get away with adjoining an algebraic number $\alpha$ to some simpler ring $R$ of numbers like the integers or rationals $\mathbb Q$, and these characterize all of your solutions. If this number obeys the algebraic equation $$\alpha^n = \sum_{k=0}^{n-1} q_k \alpha^k$$ for all $q_k \in R,$ we call the above polynomial equation $Q(\alpha) = 0$, and then we can adjoin this number by using polynomials of degree $n - 1$ with coefficients from $R$ and evaluated at $\alpha$: the ring is formally denoted $R[\alpha]/Q(\alpha),$ "the quotient group of the polynomials with coefficients in $R$ of some parameter $\alpha$ given their equivalence modulo polynomial division by $Q(\alpha).$"

If you write down the action of the multiplication $(\alpha\cdot)$ on the vector in $R^n$ corresponding to such a polynomial in this ring, it will look like the matrix $$\alpha \leftrightarrow A = \begin{bmatrix}0 & 0 & 0 & \dots & 0 & q_0\\ 1 & 0 & 0 & \dots & 0 & q_1 \\ 0 & 1 & 0 & \dots & 0 & q_2 \\ \dots & \dots & \dots & \dots & \dots & \dots \\ 0 & 0 & 0 & \dots & 0 & q_{n-2} \\ 0 & 0 & 0 & \dots & 1 & q_{n-1} \\ \end{bmatrix},$$ and putting such a matrix in row-reduced echelon form is actually so simple that we can immediately strengthen our claim to say that the above ring $R[\alpha]/Q(\alpha)$ is a field when $R$ is a field and $q_0 \ne 0.$ The matrix $\sum_k p_k A^k$ is then a matrix representation of the polynomial $\sum_k p_k \alpha^k$ which implements all of the required operations as matrix operations.

A simple example before \$#!& gets real.

For example, there is a famous "O(1)" solution to generating the $k^\text{th}$ Fibonacci number which comes from observing that the recurrence $F_k = F_{k-1} + F_{k-2}$ can be solved by other functions for other boundary conditions than $F_{0,1} = 0,1$, and one very special set of solutions looks like $F_k = \phi^k$ for some $\phi$. Plugging and chugging we get the algebraic numbers $\phi^2 = \phi + 1,$ which we can solve for the golden ratio $\varphi = (1 + \sqrt{5})/2$ and its negative reciprocal $\bar\varphi = -\varphi^{-1}= (1-\sqrt{5})/2.$ However since the Fibonacci recurrence relation is linear, this means that any linear combination $$F_n = A \varphi^n + B \bar\varphi^n$$obeys the Fibonacci recurrence, and we can actually just choose $A = \sqrt{1/5},\; B = -\sqrt{1/5}$ to get the standard $F_{0,1} = 0,1$ starting points: this is the Fibonacci sequence defined purely in terms of exponentiation.

But, there is a problem with using this on a computer: the Double type that a computer has access to has only finite precision and the above expressions will round off wildly. What we really want is to use our arbitrary-precision Integer type to calculate this. We can do this with matrix exponentiation in a couple of different ways. The first would be to adjoin the number $\sqrt{5}$ to the integers, solving $\alpha^2 = 5.$ Then our ring consists of the numbers $a + b \sqrt{5}$ which are the matrices: $$\begin{bmatrix}a & 5b\\b & a\end{bmatrix}.$$ And that's easy-peasy to program. Your "unit vectors" can similarly be chosen as $1$ and $\varphi$ however, and that leads to the "no-nonsense" matrix $$a + b \varphi = \begin{bmatrix}a & b\\ b & a + b\end{bmatrix}$$ which I'm calling "no-nonsense" because for $a = 0, b = 1$ this is actually the Fibonacci recurrence relation on vectors $[F_{n-1}, F_{n}],$ which is a way to get to this result without going through the above hoops. There is also an interesting "symmetric" version where $\varphi$ and $\bar\varphi$ are our "unit vectors" and the matrix is (I think) $a \varphi + b \bar\varphi \leftrightarrow [2a-b,\; -a+b;\;a - b,\; -a + 2b].$

(In any case, it turns out that the supposedly "O(1)" algorithm is not: even when we exponentiate-by-squaring we have to perform $\log_2(k)$ multiplications of numbers $m_i = F_{2^i}$ which are growing asymptotically like $F_k \approx \varphi^k/\sqrt{5},$ taking some $O(k^2)$ time, just like adding up the numbers directly. The big speed gain is that the adding-bignums code will "naturally" allocate new memory for each bignum and will therefore write something like $O(k^2)$ bits in memory if you don't specially make it intelligent; the exponentiation improves this to $O(k~\log(k))$ and possibly even to $O(k)$ since the worst of these happen only at the very end.)

Going off of the algebraics into complex numbers

Interestingly, we don't need to restrict ourselves to real numbers when we do the above. We know that in $\mathbb R$ there is no $x$ satisfying $x^2 = -1,$ so the above prescribes that we extend our field to the field $$ a + b \sqrt{-1} \leftrightarrow \begin{bmatrix}a & -b\\b & a\end{bmatrix}.$$When we replace $a$ with $r\cos\theta$ and $b$ with $r\sin\theta$ we find out that in fact these "complex numbers" are all just scaled rotation matrices:$$ r (\cos\theta + i~\sin\theta) = r \begin{bmatrix}\cos\theta & -\sin\theta\\\sin\theta & \cos\theta\end{bmatrix} = r~R_\theta,$$giving us an immediate geometric understanding of a complex number as a scaled rotation (and then analytic functions are just the ones which locally look like a scaled rotation.)

Going way off the algebraics into infinitesimals.

Another interesting way to go with this is to consider adjoining a term $\epsilon$ which is not zero, but which squares to zero. This idea formalizes the idea of an "infinitesimal" with no real effort, although as mentioned before, the resulting algebra is doomed to be a ring. (We could adjoin an inverse $\infty = \epsilon^{-1}$ too but presumably we'd then have $\infty^2 = \infty$ which breaks associativity, $(\epsilon\cdot\infty)\cdot\infty \ne \epsilon\cdot(\infty\cdot\infty),$ unless we insert more infinities to push the problem out to infinity.)

Anyway, we then have the matrix: $$a + b \epsilon \leftrightarrow a I + b E = \begin{bmatrix}a & 0\\ b & a\end{bmatrix}.$$ It's precisely the transpose of what you were looking at. Following the rules, $(a + b \epsilon)^n = a^n + n~a^{n-1}~b~\epsilon$ with all of the other terms disappearing. Expanding it out we find by Taylor expansion that $f(x + \epsilon) = \sum_n f^{(n)}(x) \epsilon^n = f(x) + f'(x) \epsilon,$ and this is the property that you have seen in your own examination.

We can similarly keep infinitesimals out to second order with a 3x3 matrix $$a + b \epsilon + c \epsilon^2 \leftrightarrow \begin{bmatrix}a & 0 & 0\\ b & a & 0\\ c & b & a\end{bmatrix}$$Then $f(x I + E) = f(x) I + f'(x) E + f''(x) E^2 / 2$ straightforwardly.

CR Drost
  • 2,626
  • 11
  • 13
9

If $e$ satisfies $Xe=eX$ and $e^k=0$, then $f(X+e)=f(X) + ef'(X)+\frac{e^2}{2}f''(X) + \dots + f^{(k-1}(X)\frac{e^{k-1}}{(k-1)!}$ (a finite sum with $k$ terms).

This is true for polynomials and thus for power series that converge in a neighborhood of $X$.

ASCII Advocate
  • 2,394
  • 6
  • 14