0

Could be quite a trivial question to answer, but I just wanted to be clearer. From the available literature and the discussion in What is the difference between Gradient Descent and Newton's Gradient Descent?, both methods involve computing a derivative, and then moving towards a minimum. In case of simple gradient-descent method, we calculate only the first order derivative; in Newton's method, we calculate the second order derivative as well as hessian, and apply to the vector. Moreover, the update of the vector in Newton/s method may not always be in the direction of the (-ive) gradient.

Moreover, for a given function f(x), both methods attempt to find a minimum that satisfies f'(x)=0; in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x) = 0. Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0.

Based on above arguments, would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods? The discussion cited above also falls short answering this question.

Sal
  • 35
  • 8
  • I'm voting to close this question as off-topic because it's not about programming. The question might be on-topic in the [Mathematics](https://math.stackexchange.com/) Stack Exchange site. – ottomeister Jan 19 '20 at 00:32
  • I agree it is not directly related to programming, but then, it is; it addresses the very basic approach towards programming a possible solution. I would request you to please reconsider. – Sal Jan 19 '20 at 02:21

1 Answers1

1

in gradient-descent method, the objective is argmin f(x), whereas in Newton's method, the objective is f'(x)=0

That is not the case, both objectives are f'(x)=0. With gradient descent, just as with Newton's method, you don't have any information on whether the minima you have reached is global or local, so argmin f(x) holds only for a very small neighborhood.

Another difference is stopping criterion, which in Gradient-descent method is f'(x) = 0, whereas in newton's method, it is f(x)=0

Again, that is incorrect. Both are trying to minimize a cost function f(x), and there exists no guarantees whatsoever that the minimum value for f(x) would be zero. It could be any arbitrary value, so choosing f(x)=0 as the stopping criterion would just plainly be wrong. A good criterion to stop both methods is to look at how much f(x) is changing during a few consecutive iterations. If it doesn't change for a few, then you might conclude that you have reached a plateau and stop. As alternatives, you can use a criterion such as the gradient's absolute value, or if you have time constraints, you can just use a fixed number of iterations.

would it be justified to say that Newton's method is an (advanced) example of gradient-based optimisation methods

By definition, a gradient method looks in the direction of the gradient. Newton's method, as you know, uses local curvature to define the path towards a local optimum, and might not follow the same direction as the gradient at all, so it just wouldn't make sense to call it gradient-based.

Ash
  • 3,973
  • 5
  • 24
  • 35
  • 1
    I think this makes sense; I had this understanding from all the literature I could find on the topic. And the article "G. Venter. Review of Optimization Techniques. Encyclopedia of Aerospace Engineering, 2010" specifically classifies Newton's Method as a gradient-descent method. Any comments? – Sal Jan 19 '20 at 02:20
  • @Sal My apologies, I was actually thinking about the `Gauss-Newton` method when I wrote this... I think that my answer holds for `Newton`'s method too though. Regarding the classification of the method as gradient-based, I think that is more or less a matter of choice... In `Newton`'s the gradient does appear explicitly in the update, and it reduces to gradient descent in dimensions when the second derivatives are constant, so I think that it would make sense to classify it as a gradient method, although it seems like a stretch. – Ash Jan 20 '20 at 08:35
  • When the derivative is computed and appears explicitly in the update, why would it seem a stretch. I would say Newton/s method does classify to be a gradient-based method. The way it is updated may be different, but it does work on the same underlying concept. – Sal Jan 21 '20 at 22:03