Minimizing a multivariable function with scipy. Derivative not known

Question

I have a function which is actually a call to another program (some Fortran code). When I call this function (run_moog) I can parse 4 variables, and it returns 6 values. These values should all be close to 0 (in order to minimize). However, I combined them like this: np.sum(results**2). Now I have a scalar function. I would like to minimize this function, i.e. get the np.sum(results**2) as close to zero as possible.
Note: When this function (run_moog) takes the 4 input parameters, it creates an input file for the Fortran code that depends on these parameters.

I have tried several ways to optimize this from the scipy docs. But none works as expected. The minimization should be able to have bounds on the 4 variables. Here is an attempt:

from scipy.optimize import minimize # Tried others as well from the docs
x0 = 4435, 3.54, 0.13, 2.4
bounds = [(4000, 6000), (3.00, 4.50), (-0.1, 0.1), (0.0, None)]
a = minimize(fun_mmog, x0, bounds=bounds, method='L-BFGS-B')  # I've tried several different methods here
print a

This then gives me

  status: 0
 success: True
    nfev: 5
     fun: 2.3194639999999964
       x: array([  4.43500000e+03,   3.54000000e+00,   1.00000000e-01,
         2.40000000e+00])
 message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
     jac: array([ 0., 0., -54090399.99999981, 0.])
     nit: 0

The third parameter changes slightly, while the others are exactly the same. Also there have been 5 function calls (nfev) but no iterations (nit). The output from scipy is shown here.

It seems you got stucked with very similar values for your function before even iterating once. Perhaps, a method like Nelder-Mead (which doesn't use derivatives) with small ftol and xtol parameters might get unstucked. IMHO, a grid search might be the best place to begin with (and give then initial values to minimize). Are you sure that you do not get NaN's or Inf's? (Sometimes even with theoretical bounds set for the parameters, it happens that an algorithms returns one of these, if it is not numerically stable) — Cristián Antuña, Apr 24 '15 at 19:50
The problem is that the step to calculate the approximate gradient must be is too small, hence the zeros in the jacobian matrix. Try adding `options={'epsilon': 1e-4}` with `method='L-BFGS-B'`, or some larger value (it's `1e-8` by default) until you jacobian matrix doesn't have zeros. — rth, Apr 24 '15 at 20:25
Both solutions seems to help working. However, I would like to use one of the three, where I can use constraints (`BFGS`, `L-BFGS-B`, `SLSQP`). So by setting `eps: 1e0` it runs, but at some point I run outside of my constraints set. Only thing added from OP is `, options={'eps': 1e+0})` in the `minimize` function. — Daniel Thaagaard Andreasen, Apr 24 '15 at 21:22
Also, it doesn't really make sense that the step size is the same for the three variables. The first variable is of the order `1e4`, while the rest is of order 1. Can `epsilon` be a list of values? — Daniel Thaagaard Andreasen, Apr 24 '15 at 21:29

score 9 · Answer 1 · answered May 16 '15 at 16:24

Couple of possibilities:

Try COBYLA. It should be derivative-free, and supports inequality constraints.
You can't use different epsilons via the normal interface; so try scaling your first variable by 1e4. (Divide it going in, multiply coming back out.)
Skip the normal automatic jacobian constructor, and make your own:

Say you're trying to use SLSQP, and you don't provide a jacobian function. It makes one for you. The code for it is in approx_jacobian in slsqp.py. Here's a condensed version:

def approx_jacobian(x,func,epsilon,*args):
    x0 = asfarray(x)
    f0 = atleast_1d(func(*((x0,)+args)))
    jac = zeros([len(x0),len(f0)])
    dx = zeros(len(x0))
    for i in range(len(x0)):
        dx[i] = epsilon
        jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
        dx[i] = 0.0

    return jac.transpose()

You could try replacing that loop with:

    for (i, e) in zip(range(len(x0)), epsilon):
        dx[i] = e
        jac[i] = (func(*((x0+dx,)+args)) - f0)/e
        dx[i] = 0.0

You can't provide this as the jacobian to minimize, but fixing it up for that is straightforward:

def construct_jacobian(func,epsilon):
    def jac(x, *args):
        x0 = asfarray(x)
        f0 = atleast_1d(func(*((x0,)+args)))
        jac = zeros([len(x0),len(f0)])
        dx = zeros(len(x0))
        for i in range(len(x0)):
            dx[i] = epsilon
            jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
            dx[i] = 0.0

        return jac.transpose()
    return jac

You can then call minimize like:

minimize(fun_mmog, x0,
         jac=construct_jacobian(fun_mmog, [1e0, 1e-4, 1e-4, 1e-4]),
         bounds=bounds, method='SLSQP')

score 2 · Answer 2 · edited May 23 '17 at 10:30

It sounds like your target function doesn't have well-behaving derivatives. The line in the output jac: array([ 0., 0., -54090399.99999981, 0.]) means that changing only the third variable value is significant. And because the derivative w.r.t. to this variable is virtually infinite, there is probably something wrong in the function. That is also why the third variable value ends up in its maximum.

I would suggest that you take a look at the derivatives, at least in a few points in your parameter space. Compute them using finite differences and the default step size of SciPy's fmin_l_bfgs_b, 1e-8. Here is an example of how you could compute the derivates.

Try also plotting your target function. For instance, keep two of the parameters constant and let the two others vary. If the function has multiple local optima, you shouldn't use gradient-based methods like BFGS.

The "problem" was that the small step-size did not change anything in the four other parameters. Then suddenly, there is a small change in the third parameter, and the derivative is huge. The solution at the moment is, to bring all four variables to the same order of magnitude. This helps a little. It is a good idea to make a plot. I didn't even think of that. Thank you. — Daniel Thaagaard Andreasen, May 17 '15 at 12:18

Hari · Answer 3 · 2015-05-17T07:41:37.050

1

How difficult is it to get an analytical expression for the gradient? If you have that you can then approximate the product of Hessian with a vector using finite difference. Then you can use other optimization routines available.

Among the various optimization routines available in SciPy, the one called TNC (Newton Conjugate Gradient with Truncation) is quite robust to the numerical values associated with the problem.

edited May 17 '15 at 07:41

answered May 17 '15 at 07:20

Hari

927
2
13
21

1

It is not really possible to get any analytical expressions in this case. – Daniel Thaagaard Andreasen May 17 '15 at 12:15

score 1 · Answer 4 · edited May 23 '17 at 11:55

The Nelder-Mead Simplex Method (suggested by Cristián Antuña in the comments above) is well known to be a good choice for optimizing (posibly ill-behaved) functions with no knowledge of derivatives (see Numerical Recipies In C, Chapter 10).

There are two somewhat specific aspects to your question. The first is the constraints on the inputs, and the second is a scaling problem. The following suggests solutions to these points, but you might need to manually iterate between them a few times until things work.

Input Constraints

Assuming your input constraints form a convex region (as your examples above indicate, but I'd like to generalize it a bit), then you can write a function

is_in_bounds(p):
    # Return if p is in the bounds

Using this function, assume that the algorithm wants to move from point from_ to point to, where from_ is known to be in the region. Then the following function will efficiently find the furthermost point on the line between the two points on which it can proceed:

from numpy.linalg import norm

def progress_within_bounds(from_, to, eps):
    """
    from_ -- source (in region)
    to -- target point
    eps -- Eucliedan precision along the line
    """

    if norm(from_, to) < eps:
        return from_
    mid = (from_ + to) / 2
    if is_in_bounds(mid):
        return progress_within_bounds(mid, to, eps)
    return progress_within_bounds(from_, mid, eps)

(Note that this function can be optimized for some regions, but it's hardly worth the bother, as it doesn't even call your original object function, which is the expensive one.)

One of the nice aspects of Nelder-Mead is that the function does a series of steps which are so intuitive. Some of these points can obviously throw you out of the region, but it's easy to modify this. Here is an implementation of Nelder Mead with modifications made marked between pairs of lines of the form ##################################################################:

import copy

'''
    Pure Python/Numpy implementation of the Nelder-Mead algorithm.
    Reference: https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
'''


def nelder_mead(f, x_start, 
        step=0.1, no_improve_thr=10e-6, no_improv_break=10, max_iter=0,
        alpha = 1., gamma = 2., rho = -0.5, sigma = 0.5):
    '''
        @param f (function): function to optimize, must return a scalar score 
            and operate over a numpy array of the same dimensions as x_start
        @param x_start (numpy array): initial position
        @param step (float): look-around radius in initial step
        @no_improv_thr,  no_improv_break (float, int): break after no_improv_break iterations with 
            an improvement lower than no_improv_thr
        @max_iter (int): always break after this number of iterations.
            Set it to 0 to loop indefinitely.
        @alpha, gamma, rho, sigma (floats): parameters of the algorithm 
            (see Wikipedia page for reference)
    '''

    # init
    dim = len(x_start)
    prev_best = f(x_start)
    no_improv = 0
    res = [[x_start, prev_best]]

    for i in range(dim):
        x = copy.copy(x_start)
        x[i] = x[i] + step
        score = f(x)
        res.append([x, score])

    # simplex iter
    iters = 0
    while 1:
        # order
        res.sort(key = lambda x: x[1])
        best = res[0][1]

        # break after max_iter
        if max_iter and iters >= max_iter:
            return res[0]
        iters += 1

        # break after no_improv_break iterations with no improvement
        print '...best so far:', best

        if best < prev_best - no_improve_thr:
            no_improv = 0
            prev_best = best
        else:
            no_improv += 1

        if no_improv >= no_improv_break:
            return res[0]

        # centroid
        x0 = [0.] * dim
        for tup in res[:-1]:
            for i, c in enumerate(tup[0]):
                x0[i] += c / (len(res)-1)

        # reflection
        xr = x0 + alpha*(x0 - res[-1][0])
        ##################################################################
        ##################################################################
        xr = progress_within_bounds(x0, x0 + alpha*(x0 - res[-1][0]), prog_eps)
        ##################################################################
        ##################################################################
        rscore = f(xr)
        if res[0][1] <= rscore < res[-2][1]:
            del res[-1]
            res.append([xr, rscore])
            continue

        # expansion
        if rscore < res[0][1]:
            xe = x0 + gamma*(x0 - res[-1][0])
            ##################################################################
            ##################################################################
            xe = progress_within_bounds(x0, x0 + gamma*(x0 - res[-1][0]), prog_eps)
            ##################################################################
            ################################################################## 
            escore = f(xe)
            if escore < rscore:
                del res[-1]
                res.append([xe, escore])
                continue
            else:
                del res[-1]
                res.append([xr, rscore])
                continue

        # contraction
        xc = x0 + rho*(x0 - res[-1][0])
        ##################################################################
        ##################################################################
        xc = progress_within_bounds(x0, x0 + rho*(x0 - res[-1][0]), prog_eps)
        ##################################################################
        ################################################################## 
        cscore = f(xc)
        if cscore < res[-1][1]:
            del res[-1]
            res.append([xc, cscore])
            continue

        # reduction
        x1 = res[0][0]
        nres = []
        for tup in res:
            redx = x1 + sigma*(tup[0] - x1)
            score = f(redx)
            nres.append([redx, score])
        res = nres

Note This implementation is GPL, which is either fine for you or not. It's extremely easy to modify NM from any pseudocode, though, and you might want to throw in simulated annealing in any case.

Scaling

This is a trickier problem, but jasaarim has made an interesting point regarding that. Once the modified NM algorithm has found a point, you might want to run matplotlib.contour while fixing a few dimensions, in order to see how the function behaves. At this point, you might want to rescale one or more of the dimensions, and rerun the modified NM.

–

Minimizing a multivariable function with scipy. Derivative not known

4 Answers4