Questions tagged [numpy-memmap]

An advanced numpy.memmap() utility to avoid RAM-size limit and reduce final RAM-footprint ( at a reasonable cost of O/S-cached fileIO mediated via a small-size in-RAM proxy-view window into whole array-data ) Creates and handles a memory-map to an array stored in a binary file on disk.

Creates and handles a memory-map to an array stored in a binary file on disk.

Memory-mapped files are used for arranging access to large non-in-RAM arrays via small proxy-segments of an O/S-cached area of otherwise unmanageably large data files.

Leaving most of the data on disk, without reading the entire file into RAM memory and working with data via smart, moving, O/S-cached window-view into the non-in-RAM big file, enables to escape from both O/S RAM-limits and from adverse side-effects of python's memory management painfull reluctance to release once allocated memory-blocks anytime before the python program termination.

numpy's memmap's are array-like objects.

This differs from Python's mmap module, which uses file-like objects.

83 questions
15
votes
2 answers

Can memmap pandas series. What about a dataframe?

It seems that I can memmap the underlying data for a python series by creating a mmap'd ndarray and using it to initialize the Series. def assert_readonly(iloc): try: iloc[0] = 999 # Should be non-editable …
user48956
  • 11,390
  • 14
  • 67
  • 125
6
votes
1 answer

numpy mean is larger than max for memmap

I have an array of timestamps, increasing for each row in the 2nd column of matrix X. I calculate the mean value of the timestamps and it's larger than the max value. I'm using a numpy memmap for storage. Why is this happening? >>>…
siamii
  • 20,540
  • 26
  • 86
  • 136
5
votes
2 answers

How to read a large text file avoiding reading line-by-line :: Python

I have a large data file (N,4) which I am mapping line-by-line. My files are 10 GBs, a simplistic implementation is given below. Though the following works, it takes huge amount of time. I would like to implement this logic such that the text file…
nuki
  • 101
  • 5
5
votes
2 answers

numpy memmap memory usage - want to iterate once

let say I have some big matrix saved on disk. storing it all in memory is not really feasible so I use memmap to access it A = np.memmap(filename, dtype='float32', mode='r', shape=(3000000,162)) now let say I want to iterate over this matrix (not…
user2717954
  • 1,518
  • 2
  • 12
  • 26
5
votes
1 answer

Do xarray or dask really support memory-mapping?

In my experimentation so far, I've tried: xr.open_dataset with chunks arg, and it loads the data into memory. Set up a NetCDF4DataStore, and call ds['field'].values and it loads the data into memory. Set up a ScipyDataStore with mmap='r', and…
4
votes
0 answers

Caching a data frame in joblib

Joblib has functionality for sharing Numpy arrays across processes by automatically memmapping the array. However this makes use of Numpy specific facilities. Pandas does use Numpy under the hood, but unless your columns all have the same data type,…
shadowtalker
  • 8,614
  • 2
  • 34
  • 70
4
votes
1 answer

Numpy Memmap Ctypes Access

I'm trying to use a very large numpy array using numpy memmap, accessing each element as a ctypes Structure. class My_Structure(Structure): _fields_ = [('field1', c_uint32, 3), ('field2', c_uint32, 2), ('field3',…
sheridp
  • 1,187
  • 1
  • 9
  • 19
4
votes
2 answers

packing boolean array needs go throught int (numpy 1.8.2)

I'm looking for the more compact way to store boolean. numpy internally need 8bits to store one boolean, but np.packbits allow to pack them, that's pretty cool. The problem is that to pack in a 4e6 bytes array a 32e6 bytes array of boolean we need…
user3313834
  • 5,701
  • 4
  • 40
  • 76
3
votes
0 answers

Numpy Memmap WinError8

My first StackOverflow message after 6 years of using great experience from this site. Thank you all for all the great help you have offered to me and to others. This problem, however, baffles me completely and I would like to ask for assistance…
3
votes
0 answers

numpy memmap read error memory mapped size must be positive

I am reading a large binary file in partitions. Each partition is mapped using numpy.memmap. The file consist of 1M rows, where a row is 198 2-byte integers. A partition is 1000 rows long. Below is the code snippet: mdata = np.memmap(fn,…
3
votes
1 answer

Python: passing memmap array through function?

Suppose that I am working with very large array (e.g., ~45GB) and am trying to pass it through a function which open accepts numpy arrays. What is the best way to: Store this for limited memory? Pass this stored array into a function that takes…
Andy
  • 155
  • 1
  • 7
3
votes
0 answers

When updating a numpy.memmap'd file in parallel, is there a way to only "flush" a slice and not the whole file?

I have to do a lot of nasty i/o and I have elected to use memory mapped files with numpy...after a lot of headache I realized that when a process "flushes" to disk it often overwrites what other processes are attempting to write with old data...I…
3
votes
1 answer

Why am I getting an OverflowError and WindowsError with numpy memmap and how to solve it?

In relation to my other question here, this code works if I use a small chunk of my dataset with dtype='int32', using a float64 produces a TypeError on my main process after this portion because of safe rules so I'll stick to working with int32 but…
ZeferiniX
  • 421
  • 4
  • 16
3
votes
1 answer

Memory Error when using float32 in dask array

I am trying to import a 1.25 GB dataset into python using dask.array The file is a 1312*2500*196 Array of uint16's. I need to convert this to a float32 array for later processing. I have managed to stitch together this Dask array in uint16, however…
Amdixer
  • 61
  • 2
2
votes
0 answers

Numpy memmap throttles with Pytorch Dataloader when available RAM less than file size

I'm working on a dataset that is too big to fit into RAM. The solution I'm trying currently is to use numpy memmap to load one sample/row at a time using Dataloader. The solution looks something like this: class MMDataset(torch.utils.data.Dataset): …
Kevin
  • 71
  • 1
  • 4
1
2 3 4 5 6