Use Tensorflow/PyTorch to speed up minimisation of a custom function

Question

I do a lot of simulations, for which I often need to minimise complicated user-defined functions, for which I generally use numpy and scipy.optimize.minimize(). However, the problem with this is that I need to explicitly write down a gradient function, which can sometimes be very difficult/impossible, to find. And for large dimensional vectors, the numerical derivatives calculated by scipy are prohibitively expensive.

So, I am trying to switch to Tensorflow or PyTorch to take advantage both of their automatic differentiation capabilities, and to be able to exploit GPUs freely. Let me give an explicit example of a function who derivative is somewhat complicated to write down (would required a lot of chain rule), and thus seems ripe for Tensorflow or PyTorch -- calculating the dihedral angle between the two triangles formed by four points in 3d space:

def dihedralAngle(xyz):
## calculate dihedral angle between 4 nodes

    p1, p2, p3, p4 = 0, 1, 2, 3

    ## get unit normal vectors
    N1 = np.cross(xyz[p1]-xyz[p3] , xyz[p2]-xyz[p3])
    N2 = - np.cross(xyz[p1]-xyz[p4] , xyz[p2]-xyz[p4])
    n1, n2 = N1 / np.linalg.norm(N1), N2 / np.linalg.norm(N2) 

    angle = np.arccos(np.dot(n1, n2))

    return angle

xyz1 = np.array([[0.2       , 0.        , 0.        ],
       [0.198358  , 0.02557543, 0.        ],
       [0.19345897, 0.05073092, 0.        ],
       [0.18538335, 0.0750534 , 0.        ]]) # or any (4,3) ndarray

print(dihedralAngle(xyz1)) >> 3.141

I could easily minimise this using scipy.optimize.minimize(), and I should get 0. For such a small function, I don't really the need a gradient (explicit or numerical). However, if I wish to iterate over many, many nodes, and minimise some function that depends on all the dihedral angles, then the overhead is much higher?

My questions then --

How would I implement this minimisation problem using TensorFlow or PyTorch? Both for a single dihedral angle, and for a list of such angles (i.e. we need to account for looping over lists).
Also, could I just get the gradient using automatic differentiation, to plug back into scipy.optimize.minimize() if desired? For example, scipy.optimize.minimize() allows easily for bounds and constraints, something that I haven't noticed in the Tensorflow or PyToch optimisation modules.

1) you probably need something like jax/autograd packages. 2) I haven't either — Shihab Shahriar Khan, Jul 08 '19 at 15:23

score 1 · Answer 1 · answered Apr 08 '21 at 14:36

Here is a solution using automatic computation of the gradient by torch and then the minimizer of scipy with a lib I wrote autograd-minimize. The advantage over SGD is a better precision of the estimation (using second order methods). It is probably equivalent to using LBFGS from torch:

import numpy as np
import torch
from autograd_minimize import minimize


def dihedralAngle(xyz):
## calculate dihedral angle between 4 nodes

    p1, p2, p3, p4 = 0, 1, 2, 3

    ## get unit normal vectors
    N1 = np.cross(xyz[p1]-xyz[p3] , xyz[p2]-xyz[p3])
    N2 = - np.cross(xyz[p1]-xyz[p4] , xyz[p2]-xyz[p4])
    n1, n2 = N1 / np.linalg.norm(N1), N2 / np.linalg.norm(N2) 

    angle = np.arccos(np.dot(n1, n2))

    return angle
def compute_angle(p1, p2):
    # inner_product = torch.dot(p1, p2)
    inner_product = (p1*p2).sum(-1)
    p1_norm = torch.linalg.norm(p1, axis=-1)
    p2_norm = torch.linalg.norm(p2, axis=-1)
    cos = inner_product / (p1_norm * p2_norm)
    cos = torch.clamp(cos, -0.99999, 0.99999)
    angle = torch.acos(cos)
    return angle

def compute_dihedral(v1,v2,v3,v4):
    ab = v1 - v2
    cb = v3 - v2
    db = v4 - v3
    u = torch.cross(ab, cb)
    v = torch.cross(db, cb)
    w = torch.cross(u, v)
    angle = compute_angle(u, v)
    angle = torch.where(compute_angle(cb, w) > 1, -angle, angle)

    return angle

def loss_func(v1,v2,v3,v4):
    return ((compute_dihedral(v1,v2,v3,v4)+2)**2).mean()


x0=[np.array([-17.0490,   5.9270,  21.5340]),
    np.array([-0.1608,  0.0600, -0.0371]),
    np.array([-0.2000,  0.0007, -0.0927]),
    np.array([-0.1423,  0.0197, -0.0727])]

res = minimize(loss_func, x0, backend='torch')

print(compute_dihedral(*[torch.tensor(v) for v in res.x]))

wei · Answer 2 · 2021-02-22T05:00:22.657

I'm working on the same thing. Here is what I got.

def compute_angle(p1, p2):
    # inner_product = torch.dot(p1, p2)
    inner_product = (p1*p2).sum(-1)
    p1_norm = torch.linalg.norm(p1, axis=-1)
    p2_norm = torch.linalg.norm(p2, axis=-1)
    cos = inner_product / (p1_norm * p2_norm)
    cos = torch.clamp(cos, -0.99999, 0.99999)
    angle = torch.acos(cos)
    return angle
def compute_dihedral(v1,v2,v3,v4):
    ab = v1 - v2
    cb = v3 - v2
    db = v4 - v3
    u = torch.cross(ab, cb)
    v = torch.cross(db, cb)
    w = torch.cross(u, v)
    angle = compute_angle(u, v)
    # angle = torch.where(compute_angle(cb, w) > 0.001, -angle, angle)
    angle = torch.where(compute_angle(cb, w) > 1, -angle, angle)
#     try:
#         if compute_angle(cb, w) > 0.001:
#             angle = -angle
#     except ZeroDivisionError:
#         # dihedral=pi
#         pass
    return angle

v1 = torch.tensor([-17.0490,   5.9270,  21.5340], requires_grad=True)
v2 = torch.tensor([-0.1608,  0.0600, -0.0371], requires_grad=True)
v3 = torch.tensor([-0.2000,  0.0007, -0.0927], requires_grad=True)
v4 = torch.tensor([-0.1423,  0.0197, -0.0727], requires_grad=True)

dihedral = compute_dihedral(v1,v2,v3,v4)
target_dihedral = -2

print(dihedral)   # should print -2.6387


for i in range(100):
    dihedral = compute_dihedral(v1,v2,v3,v4)
    loss = (dihedral - target_dihedral)**2
    loss.backward()
    learning_rate = 0.001
    with torch.no_grad():
        v1 -= learning_rate * v1.grad
        v2 -= learning_rate * v2.grad
        v3 -= learning_rate * v3.grad
        v4 -= learning_rate * v4.grad

        # Manually zero the gradients after updating weights
        v1.grad = None
        v2.grad = None
        v3.grad = None
        v4.grad = None
print(compute_dihedral(v1,v2,v3,v4))   # should print -2

Yep. Note that you could achieve an identical result using `torch.optim.SGD`, and you wouldn't have to manually handle each value's gradient. — Arya McCarthy, Feb 22 '21 at 05:09

Use Tensorflow/PyTorch to speed up minimisation of a custom function

2 Answers2