Questions tagged [numpy-memmap]

An advanced numpy.memmap() utility to avoid RAM-size limit and reduce final RAM-footprint ( at a reasonable cost of O/S-cached fileIO mediated via a small-size in-RAM proxy-view window into whole array-data ) Creates and handles a memory-map to an array stored in a binary file on disk.

Creates and handles a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for arranging access to large non-in-RAM arrays via small proxy-segments of an O/S-cached area of otherwise unmanageably large data files.

Leaving most of the data on disk, without reading the entire file into RAM memory and working with data via smart, moving, O/S-cached window-view into the non-in-RAM big file, enables to escape from both O/S RAM-limits and from adverse side-effects of python's memory management painfull reluctance to release once allocated memory-blocks anytime before the python program termination.

numpy's memmap's are array-like objects.

This differs from Python's mmap module, which uses file-like objects.

83 questions
1
vote
1 answer

Unable to free memory consumed by numpy arrays

I have a set of 5 files in the format .npz. I need to extract the numpy arrays from these files one by one and then use it to train a model. After loading the first numpy array in the memory and training the model with it, if i try to remove it from…
Ram
  • 604
  • 7
  • 22
1
vote
0 answers

concatenating numpy memmap'd files into single memmap

I have a very large number (>1000) of files, each about 20MB, which represent continuous time-series data saved in a simple binary format such that if I concatenate them all directly, I recover my full time series. I would like to do this virtually…
KBriggs
  • 1,098
  • 2
  • 16
  • 38
1
vote
1 answer

Crashing RAM using memmap in Oja rule

I am using oja's rule on dataset of size 400x156300. It seems to crash my RAM. I am not sure what is causing this. Please help. I have 12 GB of RAM. Tried using memmap but still crashing!! #convert memmap and reduce…
Abhishek Bhatia
  • 7,916
  • 14
  • 73
  • 134
1
vote
2 answers

Open np.memmap() binary file within a python with-context manager

I have a very strange problem where I can't open a file from one of my larger scripts. This problem is intermittent and I can't seem to understand the error. I am getting this error: IOError: [Errno 22] invalid mode ('w+') or filename:…
dubbbdan
  • 2,234
  • 16
  • 33
0
votes
0 answers

Why do I receive an error in loading a hdr file to memmap

The following code is a valid section from a script: import numpy as np from scipy.linalg import norm import os.path filename = os.path.join('filename.hdr') rows, bands, cols = 500,425,680 mm = np.memmap(filename, dtype=np.float32,…
0
votes
0 answers

How to use a ndarray of stored ndarrays with memmap as a big ndarray tensor

I recently started to use numpy memmap to link an array in my project since I have a 3 dimensions tensor for a total of 133 billions values for a graph of the dataset I am using as example. I am trying to calculate the heat kernel signature of a…
Ripper346
  • 379
  • 3
  • 12
0
votes
0 answers

Concatenate two memmapped numpy arrays of different sizes together (Python)?

I'm currently trying to use pysat to solve some k-colorability problems I have. The problem I'm having is a have a CNF formula (a list of lists) of the form: [1 2 3 4 5 6 7] [8 9 10 11 12 13 14] ... [71 72 73 74 75 76 77] [-60 -80] [-61 -81] ... So…
0
votes
0 answers

Overhead of loading large numpy arrays

My question is simple; and I could not find a resource that answers it. Somewhat similar links are using asarray, on numbers in general, and the most succinct one here. How can I "calculate" the overhead of loading a numpy array into RAM (if there…
emil
  • 85
  • 7
0
votes
0 answers

Is there a way to know how much memory a numpy.memmap is currently using?

I want to investigate the memory usage of a python program that uses numpy.memmap to access data from large files. Is there a way to check the size in memory that a memmap is currently using? I tried sys.getsizeof on the numpy object and the _mmap…
Colin
  • 8,627
  • 10
  • 42
  • 50
0
votes
0 answers

How to feed a conv2d net with a large npy file without overhelming the RAM memory?

I have a large dataset in a .npy format of size (500000,18). In order to feed it in a conv2D net using a generator I slipt in in X and y and reshape it in the format (-1, 96, 10, 10, 17) and (-1, 1), respectively. However, when I feed it inside the…
0
votes
1 answer

Shuffling and importing few rows of a saved numpy file

I have 2 saved .npy files: X_train - (18873, 224, 224, 3) - 21.2GB Y_train - (18873,) - 148KB X_train is cats and dogs images (cats being in 1st half and dogs in 2nd half, unshuffled) and is mapped with Y_train as 0 and 1. Thus Y_train is…
Rahul Vishwakarma
  • 1,240
  • 1
  • 4
  • 14
0
votes
0 answers

Using POSIX_FADV_DONTNEED to purge numpy memmap

We are attempting to overcome an issue detailed thoroughly in this post, where loading data from a large h5 caused the cache to fill, and subsequently caused swapping which rendered our machine unusable. As discussed on the post, we attempted to use…
0
votes
1 answer

Numpy load part of *.npz file in mmap_mode

I know there already exists a similar question, which has not been answered. I have a very large numpy array saved in a npz file. I don't want it to be loaded completely (my RAM can't handle it entirely), but just want to load a part of it. This is…
Silvano
  • 25
  • 5
0
votes
0 answers

how to put numpy array entirely on RAM using numpy memmap?

I would like to use a memmap allocated numpy array that can be processed in parallel using joblib i.e. shared memory between different processes. But I also want the big array to be stored entirely on RAM to avoid the write/read to disk that memmap…
danny
  • 799
  • 6
  • 27
0
votes
0 answers

What's the best way to re-order rows in a large binary file?

I have some large data files (32 x VERY BIG) that I would like to concatenate. However, the data were collected in the wrong order, so I need to reorder the rows as well. So far, what I am doing is: # Assume FILE_1 and FILE_2 are paths to the…