Questions tagged [numerical-optimization]

Numerical methods for continuous optimization.

Numerical Optimization is one of the central techniques in Machine Learning. For many problems it is hard to figure out the best solution directly, but it is relatively easy to set up a loss function that measures how good a solution is - and then minimize the parameters of that function to find the solution.

Learn more about solving numerical optimization problems in this pdf.

Sources: http://www.benfrederickson.com/numerical-optimization/

1347 questions
143
votes
4 answers

Partial derivative in gradient descent for two variables

I've started taking an online machine learning class, and the first learning algorithm that we are going to be using is a form of linear regression using gradient descent. I don't have much of a background in high level math, but here is what I…
60
votes
2 answers

What is the difference between projected gradient descent and ordinary gradient descent?

I just read about projected gradient descent but I did not see the intuition to use Projected one instead of normal gradient descent. Would you tell me the reason and preferable situations of projected gradient descent? What does that projection…
erogol
  • 887
  • 3
  • 9
  • 15
43
votes
4 answers

Gradient descent with constraints

In order to find the local minima of a scalar function $p(x), x\in \mathbb{R}^3$, I know we can use the gradient descent method: $$x_{k+1}=x_k-\alpha_k \nabla_xp(x)$$ where $\alpha_k$ is the step size and $\nabla_xp(x)$ is the gradient of $p(x)$. My…
29
votes
3 answers

What is the definition of a first order method?

The term "first order method" is often used to categorize a numerical optimization method for say constrained minimization, and similarly for a "second order method". I wonder what is the exact definition of a first (or second) order method. Does…
22
votes
1 answer

Gradient descent on non-convex function works. How?

For Netflix Prize competition on recommendations one method used a stochastic gradient descent, popularized by Simon Funk who used it to solve an SVD approximately. The math is better explained here on pg 152. A rating is predicted by…
18
votes
3 answers

Optimal step size in gradient descent

Suppose a differentiable, convex function $F(x)$ exists. Then $b = a - \gamma\nabla F(a)$ implies that $F(b) \leq F(a)$ given $\gamma$ is chosen properly. The goal is to find the optimal $\gamma$ at each step. In my book, in order to do this, one…
phil12
  • 1,437
  • 3
  • 14
  • 17
17
votes
1 answer

Motivation for Mirror-Descent

I am trying to wrap my head around why mirror descent is such a popular optimization algorithm. Based on my reading, it seems like the main reason is that it improves upon the convergence rate of subgradient descent, while only using full gradient…
12
votes
2 answers

What Is the Difference Between Interior Point Methods, Active Set Methods, Cutting Plane Methods and Proximal Gradient Methods?

Can you help me explain the basic difference between Interior Point Methods, Active Set Methods, Cutting Plane Methods and Proximal Methods. What is the best method and why? What are the pros and cons of each method? What is the geometric intuition…
12
votes
1 answer

Quasi-newton methods: Understanding DFP updating formula

In Nocedal/Wright Numerical Optimization book at pages 138-139 the approximate Hessian $B_k$ update for the quasi-Newton method: (DFP method) $$B_{k+1} = \left(I-\frac{y_ks_k^T}{y_k^Ts_k}\right)B_k\left(I-\frac{s_ky_k^T}{y_k^Ts_k}\right)+…
11
votes
3 answers

Why does gradient descent work?

On Wikipedia, this is the following description of gradient descent: Gradient descent is based on the observation that if the multivariable function $F(\mathbf{x})$ is defined and differentiable in a neighborhood of a point $\mathbf{a}$, then…
11
votes
6 answers

Difference between Gradient Descent method and Steepest Descent

What is the difference between Gradient Descent method and Steepest Descent methods? In this book, they have come under different sections: http://stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf According to page 480, Gradient Descent is: $$\Delta…
11
votes
1 answer

Gauss-Newton versus gradient descent

I would like to ask first if the second order gradient descent method is the same as the Gauss-Newton method. There is something I didn't understand. I read that with the Newton's method the step we take in each iteration is along a quadratic curve…
10
votes
5 answers

Finding good approximation for $x^{1/2.4}$

I would like to a good (8 bits accuracy) approximation for $x^{1/2.4}$ in the range $[0, 1]$. This transform is used for converting linear intensities to SRGB compressed values, so it's important that I make it run fast. Plot of function: Using a…
10
votes
2 answers

What's the intuition behind the conjugate gradient method?

I have been searching for an intuitive explanation of the conjugate gradient method (as it relates to gradient descent) for at least two years without luck. I even find articles like "An Introduction to the Conjugate Gradient Method Without the…
Josh
  • 367
  • 1
  • 12
9
votes
2 answers

What Numerical Methods Are Known to Solve $ {L}_{1} $ Regularized Quadratic Programming Problems?

What numerical methods are suitable to solve the following problem $$\min_x \tfrac{1}{2}x^T A x + b^Tx + \lambda ||x||_1$$ where $x,b\in\mathbf{R}^n$, and $A\in \mathbf{R}^{n\times n}$ is positive definite, and $\lambda\in\mathbf{R}$ is…
1
2 3
89 90