Questions tagged [numpy-memmap]

An advanced numpy.memmap() utility to avoid RAM-size limit and reduce final RAM-footprint ( at a reasonable cost of O/S-cached fileIO mediated via a small-size in-RAM proxy-view window into whole array-data ) Creates and handles a memory-map to an array stored in a binary file on disk.

Creates and handles a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for arranging access to large non-in-RAM arrays via small proxy-segments of an O/S-cached area of otherwise unmanageably large data files.

Leaving most of the data on disk, without reading the entire file into RAM memory and working with data via smart, moving, O/S-cached window-view into the non-in-RAM big file, enables to escape from both O/S RAM-limits and from adverse side-effects of python's memory management painfull reluctance to release once allocated memory-blocks anytime before the python program termination.

numpy's memmap's are array-like objects.

This differs from Python's mmap module, which uses file-like objects.

83 questions
1
vote
0 answers

Is there a maximum number of processes a numpy memmap can handle at a time?

I saw here that for multi-processing, that numpy memmaps should be used https://joblib.readthedocs.io/en/latest/parallel.html#working-with-numerical-data-in-shared-memory-memmapping As this problem can often occur in scientific computing with numpy…
SantoshGupta7
  • 4,211
  • 4
  • 31
  • 64
1
vote
0 answers

How does numpy.memmap work on HDF5 with multiple datasets?

I'm trying to memory-map individual datasets in an HDF5 file: import h5py import numpy as np import numpy.random as rdm n = int(1E+8) rdm.seed(70) dset01 = rdm.rand(n) dset02 = rdm.normal(0, 1, size=n).astype(np.float32) with h5py.File('foo.h5',…
Indominus
  • 1,038
  • 9
  • 28
1
vote
1 answer

Assigning values to list slices of large dense square matrices (Python)

I'm dealing with large dense square matrices of size NxN ~(100k x 100k) that are too large to fit into memory. After doing some research, I've found that most people handle large matrices by either using numpy's memap or the pytables package.…
matohak
  • 441
  • 2
  • 12
1
vote
0 answers

How to create shared memory objects with joblib on linux?

I am working through the joblib shared memory tutorial. It seems that numpy.memmap dumps data to disk, which is unfortunate. However, using ramfs it should be theoretically possible to share memory between joblib processes on a linux box. Is there…
Him
  • 4,322
  • 2
  • 17
  • 57
1
vote
1 answer

Shuffling large memory-mapped numpy array

I have an array of dimension (20000000, 247) of size around 30 GB in a .npy file. I have 32 GB available memory. I need to shuffle the data along rows. I have opened the file in mmap_mode. However, if I try anything other than in-place modification,…
Cyttorak
  • 12,969
  • 3
  • 16
  • 38
1
vote
2 answers

Is there a way to load a numpy unicode array into a memmap?

I am trying to create an array of dtype='U' and saving that using numpy.save(), however, when trying to load the saved file into a numpy.memmap I get an error related to the size not being a multiple of 'U3' I am working with python 3.5.2. I have…
Kour
  • 33
  • 7
1
vote
0 answers

Is it possible to create a numpy.memmap of array of arrays?

I have a (4,) arrays that I want to save to the disk (The sizes I am working with can not fit into memory so I need to dynamically load what I need). However, I want to have that in a single numpy.memmap. Not sure if it is possible but any…
Kour
  • 33
  • 7
1
vote
0 answers

In python, how does GC handle mmap?

I am writing a multiprocessing system in python. One of the child processes is in charge of reading frames from a camera stream using cv2 and pass that frame along to another child process for some manipulation and previewing. The problem is that in…
royeet
  • 749
  • 1
  • 6
  • 12
1
vote
1 answer

Is it possible to close a memmap'd temporary file without flushing its contents?

Use Case: Enormous image processing. I employ mem-mapped temporary files when the intermeditate dataset exceeds physical memory. I have no need to store intermediate results to disk after I'm done with them. When I delete them, numpy seems to flush…
Jesse Meyer
  • 305
  • 1
  • 3
  • 12
1
vote
1 answer

Getting the index of the next element in a very large memmap which satisfies a condition

I have a memmap to a very large (10-100 GB) file containing current and voltage data. From a given starting index, I want to find the index of the next point for which the voltage satisfies a given condition. In the case of a relatively small list…
KBriggs
  • 1,098
  • 2
  • 16
  • 38
1
vote
1 answer

Efficiency of random slicing on a numpy memory map

I have a 20GB, 100k x 100k 'float16' 2D array as a datafile. I load it to memory as follows: fp_read = np.memmap(filename, dtype='float16', mode='r', shape=(100000, 100000)) I then attempt to read slices from it. The vertical slices I need to take…
Attack68
  • 2,931
  • 1
  • 9
  • 25
1
vote
0 answers

Using Numpy Memmap to Read In Certain Rows or Columns

I just wanted to ask if it was possible to store a numpy array as a .npy file and then use memmap to look through it at certain rows/columns?
ajl123
  • 887
  • 1
  • 14
  • 35
1
vote
0 answers

Pandas DataFrame backed by numpy memmap ndarray appears to copy data during calculations

I would like to have a relatively large Pandas DataFrame backed by an ndarray from memmap (from shared memory). I have code (below) that works, however when I run a calculation on the dataframe, overall system usage (measured by top) goes up as if…
dllahr
  • 351
  • 2
  • 13
1
vote
0 answers

Delete numpy 2D memmap array if sum equals 0

I am using a numpy memmap object that acts as a 2D array: In [8]: data_2d.shape Out[8]: (16777216, 50) What is the best way to delete a row in which the sum of that row is zero?
SabCo
  • 61
  • 1
  • 5
1
vote
2 answers

NumPy memmap performance issues

I have a large (75000 x 5 x 6000) 3D array stored as a NumPy memory map. If I simply iterate over the first dimension like so: import numpy as np import time a = np.memmap(r"S:\bin\Preprocessed\mtb.dat", dtype='float32', mode='r', shape=(75000, 5,…
triphook
  • 2,351
  • 16
  • 30