2

Question:

We have a function which takes a two-dimensional input $x = (x_1, x_2)$ and has two parameters $w = (w_1, w_2)$ given by $f(x, w) = σ(σ(x1w1)w_2 + x_2)$ where $σ(x) = 1/(1+e^{-x}))$ . We use backpropagation to estimate the right parameter values. We start by setting both the parameters to 0. Assume that we are given a training point x1 = 1, x2 = 0, y = 5. Given this information answer the next two questions. What is the value of $∂f/∂w_2$?

Solution:

Write $σ(x_1w_1)w_2 + x_2$ as $o_2$ and $x_1w_1$ as $o_1$ $∂f/∂w_2=∂f/∂o_2*∂o_2/∂w_2$

$∂f/∂w_2= σ(o_2)(1 − σ(o_2)) × σ(o_1)$ # Need to understand here

$∂f/∂w_2 = 0.5 ∗ 0.5 ∗ 0.5 =0.125$

Can some one help me to understand the solution? What is the $f$ equation, which partially derivated with $o_2$ to get $σ(o_2)(1 − σ(o_2))$?

And not understood, from where $0.5$ came.

Please help.

Sai Charan
  • 21
  • 2

1 Answers1

1

The sigmoid function $\sigma(x)=[1+\exp(-x)]^{-1}$ has the following derivative: $$ \frac{\partial \sigma}{\partial x} = \sigma(x)[1-\sigma(x)] \tag{1} $$ Let us now define \begin{align} g(x,w) &= \sigma(x_1w_1)w_2 + x_2 \\ f(x,w) &= \sigma(g(x,w)) = \sigma( \sigma(x_1w_1)w_2 + x_2 ) \end{align} Notice that $g$ is linear in $w_2$, so that: $$ \frac{\partial g}{\partial w_2} = \sigma(x_1 w_1) \tag{2} $$ Using the chain rule, we get \begin{align} \frac{\partial f}{\partial w_2} &= \frac{\partial \sigma}{\partial g} \frac{\partial g}{\partial w_2} \\[3mm] &= \underbrace{\sigma(g(x,w))[1 - \sigma(g(x,w))]}_{\text{From} (1)} \;\underbrace{\sigma(x_1 w_1)}_{\text{From} (2)} \\ &= \sigma(o_2)[1 - \sigma(o_2)] \sigma(o_1) \end{align} where the last step uses $g(x,w)=: o_2$ and $x_1w_1=: o_1$.

The only confusing part is probably the derivative of the sigmoid function (which I linked to above). My favourite proof is this one by Hans Lundmark.


As for where the $0.5$ comes from, since $w_1=w_2=0$, $x_1=1$, and $x_2=0$, we have \begin{align} o_1 &= x_1 w_1 = 0 \\ \sigma(o_1) &= \sigma(0) = [1 + \exp(0)]^{-1} = 2^{-1} = 0.5 \\ o_2 &= g(x,w) = \sigma(0)0 + 0 = 0\\ \sigma(o_2) &= \sigma(0) = 0.5 \\[2mm] \therefore\;\;\; \frac{\partial f}{\partial w_2} &= \sigma(0)[1 - \sigma(0)] \sigma(0) = 0.5[1-0.5]0.5= 0.5^3 \end{align}

user3658307
  • 9,358
  • 3
  • 23
  • 80