0

I have some large data files (32 x VERY BIG) that I would like to concatenate. However, the data were collected in the wrong order, so I need to reorder the rows as well.

So far, what I am doing is:

# Assume FILE_1 and FILE_2 are paths to the appropriate files.
# FILE_1 is a matrix of size 32 x SIZE_1
# FILE_2 is a matrix of size 32 x SIZE_2
data_1 = np.memmap(FILE_1, mode='r', dtype='<i2', order='F', shape=(32, SIZE_1))
data_2 = np.memmap(FILE_2, mode='r', dtype='<i2', order='F', shape=(32, SIZE_2))

data_out = np.memmap('output', mode='w+', dtype='<i2', order='F', shape=(32, SIZE_1 + SIZE_2))

channel_mapping = [15, 14, 13, 12, 11, 10, 9, 8, 0, 1, 2, 3, 4, 5, 6, 7,
                   24, 25, 26, 27, 28, 29, 30, 31, 23, 22, 21, 20, 19, 18, 17, 16]

data_out[:SIZE_1, :] = data_1[:, channel_mapping]
data_out[SIZE_1:SIZE_2, :] = data_2[:, channel_mapping]

I actually do this in a for loop with more than 2 files, but you get the idea.

Is this the most efficient way to do this? I am afraid that the application of channel_mapping will write the data to memory and slow the whole process down. As it is, this is much slower than simply concatenating the files.

  • If the order of the file has to be manually set then you're not going to find a very efficient way of reordering the entire file. – Bob Smith Mar 26 '20 at 21:55
  • You may be best to load the input files using memmap because you need the random access then don’t construct the corrected sequence into `data_out` in memory but by writing the sequence directly to the output file. – barny Mar 26 '20 at 22:47

0 Answers0