I want to compute the derivative of binary cross entropy loss w.r.t to the input of the sigmoid function and was wondering if there's a closed form expression? I've seen derivations of binary cross entropy loss with respect to model weights/parameters (derivative of cost function for Logistic Regression) as well as derivations of the sigmoid function w.r.t to its input (Derivative of sigmoid function $\sigma (x) = \frac{1}{1+e^{-x}}$), but nothing that combines the two. I would greatly appreciate any help with this.

There's also a post that computes the derivative of categorical cross entropy loss w.r.t to pre-softmax outputs (Derivative of Softmax loss function). I am looking for something similar in the binary case (perhaps this generalizes to the binary case, but not sure).

Jane Sully
  • 145
  • 5

1 Answers1


Use properties of logarithms to simplify as much as possible before taking the derivative.

Let $0 \leq p \leq 1$. We want to compute the derivative of the function \begin{align} L(u) &= -p \log(\sigma(u)) - (1-p)\log(1 - \sigma(u)) \\ &= -p\log( \frac{e^u}{1+e^u} ) - (1-p) \log( \frac{1}{1+e^u}) \\ &= -pu +\log(1 + e^u). \end{align}

Look how much $L(u)$ simplified! Sigmoid and binary cross-entropy are a match made in heaven.

It is now easy to take the derivative of $L$: $$ Lā€™(u) = \sigma(u) - p. $$

This formula has a nice interpretation. If the predicted probability $\sigma(u)$ agrees perfectly with the ground truth probability $p$, then the derivative of $L$ is $0$ ā€” suggesting that we do not need to make any change to the value of $u$.

  • 48,104
  • 8
  • 84
  • 154
  • Thanks for this clean and simple derivation! Makes a lot of sense. One quick question: should the e^u in the sigmoid function be e^-u? Regardless I don't think that should impact the final answer. ā€“ Jane Sully Aug 20 '21 at 15:31
  • @JaneSully Note that $\frac{e^u}{1 + e^u} = \frac{1}{1 + e^{-u}}$, so the formula $\sigma(u) = \frac{1}{1 + e^{-u}}$ that you used is equivalent to the formula $\sigma(u) = \frac{e^u}{1 + e^u}$ that I'm using here. ā€“ littleO Aug 21 '21 at 02:48