When updating a numpy.memmap'd file in parallel, is there a way to only "flush" a slice and not the whole file?

Question

I have to do a lot of nasty i/o and I have elected to use memory mapped files with numpy...after a lot of headache I realized that when a process "flushes" to disk it often overwrites what other processes are attempting to write with old data...I know that with the mmap package you can write just a chunk to disk. I would use mmap but because my data are made up of a mixture of zeros and very small numbers, it is a pain figuring out how many bytes they make as strings and which processor "owns" which chunk.

Is there a way to do something like the following:

size = comm.Get_size()
rank = comm.Get_rank()
f = open('largedatafile','w').close()
if int(rank) == 0:
    matrix = numpy.zeros(size)
    fp = numpy.memmap('largedatafile',dtype='float32',mode='r+',shape=(size))
    fp[:] = matrix[:]
    fp.flush()


fp = numpy.memmap('largedatafile',dtype='float32',mode='r+',shape=(size))
fp[rank] = numpy.random.randn() #this is a dummy task
fp.flush([rank]) #or fp[rank].flush()

So that each processor can concurrently update the mmap without flushing the old zeros back over the new data?

check [this answer](http://stackoverflow.com/a/16633274/832621) it avoids using flush and achieves multiprocess access to the memmap — Saullo G. P. Castro, Oct 09 '16 at 11:30
Saullo Castro reading in parallel seems to be what memmaps are made for...writing in parallel is not so simple, that solution doesn't do any writing. — SciPyInTheHole, Oct 09 '16 at 19:47

When updating a numpy.memmap'd file in parallel, is there a way to only "flush" a slice and not the whole file?

0 Answers0