0

memory error

My dataset have 70k images which i want to train through Conv2D but it is throwing memory error when i tried to load the dataset. I just have 4GB RAM, how i can resolve this issue through HDF5 matrix by creating a dataset in HDF5? and then loading it to train, i guess it will take less memory then. I tried some tutorial to create HDF5 dataset but this processes comes after where the error is occurring. What i am doing wrong? Please ask if question is not clear.

datagen=ImageDataGenerator(rotation_range=40,
                            width_shift_range=0.2,
                            height_shift_range=0.2,
                          rescale=1./255,
                          shear_range=0.2,
                          zoom_range=0.2,
                          horizontal_flip=True)

batch_size=28
num_classes=37
epochs=100

os.chdir("E:");
path="Dataset/One";
classes=os.listdir(path)
x=[]#Datapoints 
y=[]#labels 
for fol in classes:
    imgfiles=os.listdir(path+u'\\'+fol);
    for img in imgfiles:
        im=Image.open(path+u'\\'+fol+u'\\'+img);
        im=numpy.asarray(im)/255;
        x.append(im)
        y.append(fol)
x=numpy.array(x)
y=numpy.array(y)
#memory error####################################################
x=x.reshape((-1,100,100,1))

n=x.shape[0]
randomize=numpy.arange(n)
numpy.random.shuffle(randomize)
randomize
x=x[randomize]
y=y[randomize]
user9456630
  • 3
  • 1
  • 3

2 Answers2

4

Your problem is that you try to load all the data at once, and it is much larger than your RAM. You need to load just one batch and process it, then discard that batch and move on. A natural way to do this might be inside the for fol in classes loop--treat each fol value as one batch, and fit one batch at a time.

John Zwinck
  • 207,363
  • 31
  • 261
  • 371
0

If you don't need to access or process all of the data at once then you can load it in chunks.

If it's a csv file and if you can use pandas then maybe you can do it like this:

import pandas as pd
for chunk in pd.read_csv('dataset/movies.csv', chunksize=1000):
    # use this chunk for processing and/or training

Hope it helps!

Nuhman
  • 941
  • 14
  • 19