1

I have a set of 5 files in the format .npz. I need to extract the numpy arrays from these files one by one and then use it to train a model. After loading the first numpy array in the memory and training the model with it, if i try to remove it from memory by slicing it, the memory consumed is not reducing. Because of this, I am unable to load the second numpy array and eventually get a MemoryError.

How do I make sure that the memory is freed after training the model?

PS: Size of X_test and y_test is very small so can be ignored.

Code:

for person_id in range(1, 5):
     print "Initial memory ",resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     temp1 = np.load("../final_data/speaker_input" + str(person_id))
     X_train = temp1['arr_0']
     y_train = np.load("../final_data/speaker_final_output" + str(person_id)+ ".npy")
     X_test,y_test = data.Test(person_id=1)
     print "Input dimension ", X_train.shape
     print "Output dimension",y_train.shape
     print "Before training ",resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     lipreadtrain.train(model=net,X_train=X_train, y_train=y_train,X_test=X_test, y_test=y_test)
     print "After training ", resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
     X_train =  X_train[:1]
     y_train = y_train[:1]
     X_test = X_test[:1]
     y_test = y_test[:1]
     print len(X_train),len(y_train),len(X_test),len(y_test)
     gc.collect()
     temp1.close()

Output:

Initial memory  861116
Input dimension  (8024, 50, 2800)
Output dimension (8024, 53)
Before training  9642152
Training the model, which will take a long long time...
Epoch 1/1
8024/8024 [==============================] - 42s - loss: nan - acc: 0.2316        
----- Training Takes 42.3187870979 Seconds -----
Finished!
After training  9868080
1 1 0 0
Initial memory  9868080
Traceback (most recent call last):
File "test.py", line 21, in <module>
X_train = temp1['arr_0']
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/npyio.py",           line 224, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "/home/ubuntu/anaconda2/lib/python2.7/site-packages/numpy/lib/format.py", line 661, in read_array
array = numpy.empty(count, dtype=dtype)
MemoryError
Ram
  • 604
  • 7
  • 22

1 Answers1

2

The problem is that slicing is just a view, but it blocks underlying memory, even you delete or rename the father name. As an illustration :

from pylab import *
res=[]
i=0
while True:
    t=zeros(1e10,uint8)
    u=t[:1]
    res.append(u)
    i=i+1
    print(i)

gives:

In [17]: (executing lines 1 to 9 of "<tmp 1>")
1
2
Traceback (most recent call last):
  File "<tmp 1>", line 5, in <module>
    t=zeros(1e10,uint8)
MemoryError

Now, just make a copy : u=t[:1].copy() instead of u=t[:1], then t is released at each loop :

In [18]: (executing lines 1 to 9 of "<tmp 1>")
1
2
3
4
5
....
B. M.
  • 16,489
  • 2
  • 30
  • 47
  • Thanks for the reply. When i run X_train = X_train[:1].copy() , it gives me a AttributeError: 'list' object has no attribute 'copy' error. I tried importing copy but didnt work. If I try to use X_train = np.asarray(X_train[:1]).copy(), I get the MemoryError and if I use X_train = list(X_train[:1]).copy() I get the Attribute error. Is this a Python 3 version specific function? My current Python version is 2.7. Also, X_train was originally a numpy array. – Ram Mar 12 '16 at 15:49
  • Curious. X_train[:1] has normally the same type as X_train. perhaps X_train[0] will work ? Can you share a sample file to check ? – B. M. Mar 12 '16 at 17:19
  • It looks like `X_train` was transformed from an array to a list, or maybe a tuple of lists. – hpaulj Mar 12 '16 at 18:18
  • If i try X_train = X_train[0].copy, i get MemoryError again. Any solution to this problem? Each file is close to 300MB so sharing would not be easy. Can you create a random numpy array of dimensions 8000,50,2800 and then try? – Ram Mar 12 '16 at 19:29