I'm trying to follow the derivation of second order approximation of $\log \det X$ from page 658 of Boyd & Vandenberghe's Convex Optimization. How is the last step derived? I.e., where does the trace expression come from?
 18,977
 5
 36
 95
 4,108
 1
 19
 38

This should be just the trace of log[x]. A relevant question is here:http://math.stackexchange.com/questions/38701/howtocalculatethegradientoflogdetmatrixinverse – Bombyx mori Nov 29 '12 at 06:23

Added to [The List](https://math.meta.stackexchange.com/q/33414/339790) – Rodrigo de Azevedo Apr 07 '21 at 09:48
1 Answers
Short answer: The trace gives the scalar product on the space of matrices: $\langle X,Y \rangle = \mathrm{tr}(X^\top Y)$. Since you're working with symmetric matrices, you can forget the transposition: $\langle X,Y \rangle = \mathrm{tr}(XY)$.
Long answer, with all the gory details: Given a function $f:\mathrm S_n^{++}\to\mathbf R$, the link between the gradient $\nabla_Xf$ of the function $f$ at $X$ (which is a vector) and its differential $d_Xf$ at $X$ (which is a linear form) is that for any $U\in V$, $$ d_Xf(U) = \langle \nabla_Xf,U \rangle. $$ For your function $f$, since you know the gradient, you can write the differential: $$ d_Xf(U) = \langle X^{1},U \rangle = \mathrm{tr}(X^{1}U). $$
What about the second order differential? Well, it's the differential of the differential. Let's take it slow. The differential of $f$ is the function $df:\mathrm S_n^{++}\to\mathrm L(\mathrm M_n,\mathbf R)$, defined by $df(X) = V\mapsto \mathrm{tr}(X^{1}V)$. To find the differential of $df$ at $X$, we look at $df(X+\Delta X)$, and take the part that varies linearly in $\Delta X$. Since $df(X+\Delta X)$ is a function $\mathrm M_n\to\mathbf R$, if we hope to ever understand anything we should apply it to some matrix $V$: $$ df(X+\Delta X)(V) = \mathrm{tr}\left[ (X+\Delta X)^{1} V \right] $$ and use the approximation from the passage you cited: \begin{align*} df(X+\Delta X)(V) &\simeq \mathrm{tr}\left[ \left(X^{1}  X^{1}(\Delta X)X^{1}\right) V \right]\\ &= \mathrm{tr}(X^{1}V)  \mathrm{tr}(X^{1}(\Delta X)X^{1}V)\\ &= df(X)(V)  \mathrm{tr}(X^{1}(\Delta X)X^{1}V). \end{align*} And we just see that the part that varies linearly in $\Delta X$ is the $\mathrm{tr}(\cdots)$. So the differential of $df$ at $X$ is the function $d^2_Xf:\mathrm S_n^{++}\to\mathrm L(\mathrm M_n, \mathrm L(\mathrm M_n,\mathbf R))$ defined by $$ d^2_Xf(U)(V) = \mathrm{tr}(X^{1}UX^{1}V). $$
 1,746
 9
 9

It's better to consider $d^2f_X$ as a symmetric bilinear form; $d^2f_X(U,V)=tr(X^{1}UX^{1}V)$. When you write the Taylor formula, you use $d^2f_X(U,U)=tr((X^{1}U)^2)$. – Jan 09 '19 at 14:52