I'm trying to get some insight into how PyTorch works by implementing Newton's method for solving x = cos(x). Here's a version that works:
x = Variable(DoubleTensor([1]), requires_grad=True)
for i in range(5):
y = x - torch.cos(x)
y.backward()
x = Variable(x.data - y.data/x.grad.data, requires_grad=True)
print(x.data) # tensor([0.7390851332151607], dtype=torch.float64) (correct)
This code seems inelegant (inefficient?) to me since it's recreating the entire computational graph during each step of the for
loop (right?). I tried to avoid this by simply updating the data held by each of the variables instead of recreating them:
x = Variable(DoubleTensor([1]), requires_grad=True)
y = x - torch.cos(x)
y.backward(retain_graph=True)
for i in range(5):
x.data = x.data - y.data/x.grad.data
y.data = x.data - torch.cos(x.data)
y.backward(retain_graph=True)
print(x.data) # tensor([0.7417889255761136], dtype=torch.float64) (wrong)
Seems like, with DoubleTensor
s, I'm carrying enough digits of precision to rule out round-off error. So where's the error coming from?
Possibly related: The above snippet breaks without the retain_graph=True
flag set at every step if the for
loop. The error message I get if I omit it within the loop --- but retain it on line 3 --- is:
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time. This seems like evidence that I'm misunderstanding something...