1

I guess this question has been asked before, but unfortunately, I have not been able to find any answer yet. Sorry if I missed some. In this case, it would be nice to point me to these. Thanks.

I have a program which uses a number of large (2- and 3-dimensional, but contiguous in memory) arrays such as array2[t][x] and array3[t][x][y]. My program fills these arrays step by step, i.e. at some point all x (or x and y) for given t are calculated and afterwards stored in array2[x][t] (or array3[t][x][y]). As the program is run on a cluster which has run time constraints implemented, I would like to read-out the arrays to the hard disk at least before the wall time is over in order to read them into the same arrays again when restarting the program so that these first entries do not have to be calculated again, but can be used right away again (i.e. all the entries have to be at the same spot again). I do not need the data in human readable form, so it could also be saved in binary format.

So, what is the best (and perhaps most efficient) way to do these procedures, i.e. both reading-out and reading-in, in C (or C++)? As the arrays get filled step by step, it might be good to do the read-out after each step (or after every 10th step or so), I guess. (Might fstream be something to consider perhaps?)

I am very happy for any suggestion. Thanks a lot!

EDIT: Perhaps to clarify: I am not asking how arrays are stored in memory, but how to read and write them from memory from/to hard disk. Thanks for the suggestions so far!

Cari Baur
  • 159
  • 1
  • 7
  • 2
    Instead of reading in/out, you should probably say reading/writing, it would be much clearer. If the data is contiguous in memory and the size of each element is the same over multiple runs, it seems like you should just be able to open a file in binary mode and write out the whole chunk, then reading is just the reverse. If the arrays aren't fixed size then you will want to write a header at the beginning that has the sizes in it. – Retired Ninja Jul 17 '14 at 15:09
  • Those are contiguous memory areas, you can simply write and read that memory – Marco A. Jul 17 '14 at 15:10
  • 1
    Sorry, but I do not see how this could be a duplicate of the linked question... – Cari Baur Jul 17 '14 at 16:01

2 Answers2

1

If you're running on linux, I would use mmap, if running on windows use FileMapping. In C you would open a file, mmap that file to your "array" and then operate on that region of memory.

If your array grows or contracts, there are some nuances, but in general this is how I would proceed.

stackmate
  • 780
  • 7
  • 15
  • Thanks for the hint. I will have a deeper look, although I am quite unsure how to use it after a quick one. – Cari Baur Jul 17 '14 at 16:05
  • It gives you a chunk of memory that is shadowed by a file. Every change to the memory is written to the file. So just use the memory as the array and it will be automatically mirrored to the file. – Tyler Jul 17 '14 at 17:37
  • @Tyler: This may be a stupid question, but am I right that processing of an "mmap"ed array might take longer as it is basically read from disk and not from memory (or am I completely off here)? – Cari Baur Jul 28 '14 at 18:49
  • It depends on the size of the array, the amount of physical memory you have, and how aggressive the kernel is in trying to conserve physical memory. If the array is small enough to easily fit in physical memory, the the kernel will probably load it all into memory and all accesses are made at memory speed. If it is too large to fit in PM, then there will be a penalty when you access parts that aren't in memory. – Tyler Jul 28 '14 at 18:54
1

You have the choice of the weapons ! If your data is contiguous you can write and read it to the file as a flat 1D array.

Then for the streams you'll use, you can chose whether you want to write text (readable, so that you can control/edit values manually) or binary.

Edit: Here a small writing function using BINARY approach (the stream must be open in binary)

template <typename T>
void write_array_bin(ofstream &ofs, T *array, int number_elements)
{
    ofs.write(reinterpret_cast<char*>(&number_elements), sizeof(number_elements));
    ofs.write(reinterpret_cast<char*>(array), sizeof(T)*number_elements);
}

And a reading function, that returns a 1D array with all the values (dynamic allocation):

template <typename T>
T* read_array_bin(ifstream &ifs, size_t& number_elements)
{
    T *array = nullptr;
    ifs.read(reinterpret_cast<char*>(&number_elements), sizeof(number_elements));
    if (ifs) {
        array = new T[number_elements];
        if (!ifs.read(reinterpret_cast<char*>(array), sizeof(T)*number_elements)) {
            throw istream::failure("Incomplete read/inconsistent objects"); 
        }
    }  
    return array;
}

It's template based, so that you can used it with ints, floats, or whatever else. Here a small example of use:

int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 }; 
size_t sizec = 0;
{
    write_array_bin<int>(ofstream("test2.txt", ios::binary), a, 10);
} // bloc, so that the anonymous ofstream is closed

int *c = read_array_bin<int>(ifstream("test2.txt", ios::binary), sizec);

I posted the text version before. But re-reading your question, the binary appears more suitable and more performant, the read/write all the data in one operation.

Christophe
  • 54,708
  • 5
  • 52
  • 107