How to avoid "CUDA out of memory" in PyTorch

Question

I think it's a pretty common message for PyTorch users with low GPU memory:

RuntimeError: CUDA out of memory. Tried to allocate  MiB (GPU ;  GiB total capacity;  GiB already allocated;  MiB free;  cached)

I want to research object detection algorithms for my coursework. And many deep learning architectures require a large capacity of GPU-memory, so my machine can't train those models. I tried to process an image by loading each layer to GPU and then loading it back:

for m in self.children():
   m.cuda()
   X = m(X)
   m.cpu()
   torch.cuda.empty_cache()

But it doesn't seem to be very effective. I'm wondering is there any tips and tricks to train large deep learning models while using little GPU memory. Thanks in advance!

Edit: I'm a beginner in deep learning. Apologize if it's a dummy question:)

What's up with the smileys? lol.. Also, decrease your batch size and/or train on smaller images. Look at the Apex library for mixed precision training. Finally, when decreasing the batch size to, for example, 1 you might want to hold off on setting the gradients to zero after every iteration, since it's only based on a single image. — sansa, Dec 01 '19 at 21:02
I had the same problem using Kaggle. It worked fine with batches of 64 and then once I tried 128 and got the error nothing worked. Even the batches of 64 gave me the same error. Tried resetting a few times. `torch.cuda.empty_cache()` did not work. Instead first disable the GPU, then restart the kernel, and reactivate the GPU. This worked for me. — multitudes, Jul 01 '20 at 16:43
Reduce the batch size of the data being fed to your model. Worked for me — patrickpato, Feb 27 '21 at 03:10

score 23 · Accepted Answer · edited Jun 28 '20 at 11:49

Although,

    import torch
    torch.cuda.empty_cache()

provides a good alternative for clearing the occupied cuda memory and we can also manually clear the not in use variables by using,

    import gc
    del variables
    gc.collect()

But still after using these commands, the error might appear again because pytorch doesn't actually clears the memory instead clears the reference to the memory occupied by the variables. So reducing the batch_size after restarting the kernel and finding the optimum batch_size is the best possible option (but sometimes not a very feasible one).

Another way to get a deeper insight into the alloaction of memory in gpu is to use:

    torch.cuda.memory_summary(device=None, abbreviated=False)

wherein, both the arguments are optional. This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory and restart the kernel to avoid the error from happening again (Just like I did in my case).

Passing the data iteratively might help but changing the size of layers of your network or breaking them down would also prove effective (as sometimes the model also occupies a significant memory for example, while doing transfer learning).

`This gives a readable summary of memory allocation and allows you to figure the reason of CUDA running out of memory`. I printed out the results of the `torch.cuda.memory_summary()` call, but there doesn't seem to be anything informative that would lead to a fix. I see rows for `Allocated memory`, `Active memory`, `GPU reserved memory`, etc. What should I be looking at, and how should I take action? — stackoverflowuser2010, Sep 18 '20 at 00:54
I have a small laptop with MX130 and 16GB ram. Suitable batchsize was 4. — Gayan Kavirathne, Oct 15 '20 at 15:47
@stackoverflowuser2010 You should be printing it out between function calls to see which one causes the most memory increase — JobHunter69, May 05 '21 at 17:27

score 14 · Answer 2 · answered Dec 01 '19 at 20:55

14

Send the batches to CUDA iteratively, and make small batch sizes. Don't send all your data to CUDA at once in the beginning. Rather, do it as follows:

for e in range(epochs):
    for images, labels in train_loader:   
        if torch.cuda.is_available():
            images, labels = images.cuda(), labels.cuda()   
        # blablabla

You can also use dtypes that use less memory. For instance, torch.float16 or torch.half.

answered Dec 01 '19 at 20:55

Nicolas Gervais

21,923
10
61
96

I get this error message inside a jupyter notebook if I run a cell that starts training more than once. Restarting the kernel fixes this, but it would be nice if we could clear the cache somehow... For instance, `torch.cuda.empty_cache()` doesn't help as of now. Even though it probably should... :( – dvdblk Jun 11 '20 at 21:56

score 6 · Answer 3 · answered Oct 13 '20 at 02:27

Just reduce the batch size, and it will work. While I was training, it gave following error:

CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 10.76 GiB total capacity; 4.29 GiB already allocated; 10.12 MiB free; 4.46 GiB reserved in total by PyTorch)

And I was using batch size of 32. So I just changed it to 15 and it worked for me.

score 4 · Answer 4 · answered Oct 28 '20 at 19:53

Try not drag your grads too far.

I got the same error when I tried to sum up loss in all batches.

loss =  self.criterion(pred, label)

total_loss += loss

Then I use loss.item instead of loss which requires grads, then solved the problem

loss =  self.criterion(pred, label)

total_loss += loss.item()

The solution below is credited to yuval reina in the kaggle question

This error is related to the GPU memory and not the general memory => @cjinny comment might not work.
Do you use TensorFlow/Keras or Pytorch?
Try using a smaller batch size.
If you use Keras, Try to decrease some of the hidden layer sizes.
If you use Pytorch:
do you keep all the training data on the GPU all the time?
make sure you don't drag the grads too far
check the sizes of you hidden layer

score 0 · Answer 5 · answered Oct 13 '20 at 05:57

Implementation:

Feed the image into gpu batch by batch.
Using a small batch size during training or inference.
Resize the input images with a small image size.

Technically:

Most networks are over parameterized, which means they are too large for the learning tasks. So finding an appropriate network structure can help:

a. Compact your network with techniques like model compression, network pruning and quantization.

b. Directly using a more compact network structure like mobileNetv1/2/3.

c. Network architecture search(NAS).

score 0 · Answer 6 · edited Oct 25 '20 at 23:09

0

Best way would be lowering down the batch size. Usually it works. Otherwise try this:

import gc

del variable #delete unnecessary variables 
gc.collect()

edited Oct 25 '20 at 23:09

Dharman

21,838
18
57
107

answered Oct 20 '20 at 10:58

Harshad Patil

11
2

score 0 · Answer 7 · edited Jan 23 '21 at 07:32

There are ways to avoid, but it certainly depends on your GPU memory size:

Loading the data in GPU when unpacking the data iteratively,

features, labels in batch:
   features, labels = features.to(device), labels.to(device)

Using FP_16 or single precision float dtypes.
Try reducing the batch size if you ran out of memory.
Use .detach() method to remove tensors from GPU which are not needed.

If all of the above are used properly, PyTorch library is already highly optimizer and efficient.

score 0 · Answer 8 · edited Dec 29 '20 at 19:36

0

I have the same error but fix it by resize my images from ~600 to 100 using the lines:

import torchvision.transforms as transforms
transform = transforms.Compose([
    transforms.Resize((100, 100)), 
    transforms.ToTensor()
])

edited Dec 29 '20 at 19:36

Samuel Prevost

584
1
6
24

answered Dec 29 '20 at 19:00

Ramy AbdAllah

1
1

How to avoid "CUDA out of memory" in PyTorch

8 Answers8