Questions tagged [gradient-descent]

"Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or of the approximate gradient) of the function at the current point."

Gradient descent is based on the observation that if the multi-variable function $F(x)$ is defined and differentiable in a neighborhood of a point $a$ , then $F(x)$ decreases fastest if one goes from $a$ in the direction of the negative gradient of $F$ at $a$, $-\nabla F(a )$. It follows that, if

$$a_{n+1}=a_n-\gamma \nabla F(a_n)$$

for positive $\gamma$ that is small enough, then $F(a_n) \ge F(a_{n+1})$.

881 questions
143
votes
4 answers

Partial derivative in gradient descent for two variables

I've started taking an online machine learning class, and the first learning algorithm that we are going to be using is a form of linear regression using gradient descent. I don't have much of a background in high level math, but here is what I…
60
votes
2 answers

What is the difference between projected gradient descent and ordinary gradient descent?

I just read about projected gradient descent but I did not see the intuition to use Projected one instead of normal gradient descent. Would you tell me the reason and preferable situations of projected gradient descent? What does that projection…
erogol
  • 887
  • 3
  • 9
  • 15
50
votes
2 answers

What is the difference between the Jacobian, Hessian and the Gradient?

I know there is a lot of topic regarding this on the internet, and trust me, I've googled it. But things are getting more and more confused for me. From my understanding, The gradient is the slope of the most rapid descent. Modifying your position…
Pluviophile
  • 741
  • 2
  • 7
  • 16
43
votes
4 answers

Gradient descent with constraints

In order to find the local minima of a scalar function $p(x), x\in \mathbb{R}^3$, I know we can use the gradient descent method: $$x_{k+1}=x_k-\alpha_k \nabla_xp(x)$$ where $\alpha_k$ is the step size and $\nabla_xp(x)$ is the gradient of $p(x)$. My…
24
votes
1 answer

Log of Softmax function Derivative.

Could someone explain how that derivative was arrived at. According to me, the derivative of $\log(\text{softmax})$ is $$ \nabla\log(\text{softmax}) = \begin{cases} 1-\text{softmax}, & \text{if $i=j$} \\ -\text{softmax}, & \text{if $i \neq…
22
votes
1 answer

Gradient descent on non-convex function works. How?

For Netflix Prize competition on recommendations one method used a stochastic gradient descent, popularized by Simon Funk who used it to solve an SVD approximately. The math is better explained here on pg 152. A rating is predicted by…
18
votes
3 answers

Optimal step size in gradient descent

Suppose a differentiable, convex function $F(x)$ exists. Then $b = a - \gamma\nabla F(a)$ implies that $F(b) \leq F(a)$ given $\gamma$ is chosen properly. The goal is to find the optimal $\gamma$ at each step. In my book, in order to do this, one…
phil12
  • 1,437
  • 3
  • 14
  • 17
16
votes
7 answers

How can I "see" that calculus works for multidimensional problems?

Let's say I have some function f(x) = x^2 + b. I can see what's going on, I can count the slope geometrically even without knowing the rules of derivatives. When I need to minimize some cost function for linear problem (linear regression with…
hey
  • 189
  • 3
  • 15
15
votes
1 answer

Stochastic gradient descent for convex optimization

What happens if a convex objective is optimized by stochastic gradient descent? Is a global solution achieved?
14
votes
4 answers

Intuition Behind Accelerated First Order Methods

$\newcommand{\prox}{\operatorname{prox}}$ $\newcommand{\argmin}{\operatorname{argmin}}$ Suppose that we want to solve the following convex optimization problem: $\min_{x \in \mathbb{R}^n} g(x) + h(x)$ where we assumed that $g(x)$ is convex and…
14
votes
3 answers

Gradient is NOT the direction that points to the minimum or maximum

I understand that the gradient is the direction of steepest descent (ref: Why is gradient the direction of steepest ascent? and Gradient of a function as the direction of steepest ascent/descent). However, I am not able to visualize it. The Blue…
11
votes
3 answers

Why does gradient descent work?

On Wikipedia, this is the following description of gradient descent: Gradient descent is based on the observation that if the multivariable function $F(\mathbf{x})$ is defined and differentiable in a neighborhood of a point $\mathbf{a}$, then…
11
votes
6 answers

Difference between Gradient Descent method and Steepest Descent

What is the difference between Gradient Descent method and Steepest Descent methods? In this book, they have come under different sections: http://stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf According to page 480, Gradient Descent is: $$\Delta…
11
votes
1 answer

Gauss-Newton versus gradient descent

I would like to ask first if the second order gradient descent method is the same as the Gauss-Newton method. There is something I didn't understand. I read that with the Newton's method the step we take in each iteration is along a quadratic curve…
10
votes
2 answers

Does gradient descent converge to a minimum-norm solution in least-squares problems?

Consider running gradient descent (GD) on the following optimization problem: $$\arg\min_{\mathbf x \in \mathbb R^n} \| A\mathbf x-\mathbf b \|_2^2$$ where $\mathbf b$ lies in the column space of $A$, and the columns of $A$ are not linearly…
1
2 3
58 59