1

I have a binary file in which I save the following variables millions of times:

  • a x-size vector of floats
  • two unsigned integers

Currently I'm using ifstream for opening and reading the file but I was wondering if I could speed up the execution time by loading the whole file into memory and reducing the I/Os.

How can I load the file into memory and then convert it into the variables I want? With ifstream this is done easily but I don't know how to buffer it and then extract the data.

This is the code I'm using to save the data:

osfile.write(reinterpret_cast<const char*> (&sz), sizeof(int));// Size of vector
osfile.write(reinterpret_cast<const char*> (&vec[0]), sz*sizeof(float));
osfile.write(reinterpret_cast<const char*> (&a), sizeof(unsigned int));
osfile.write(reinterpret_cast<const char*> (&b), sizeof(unsigned int));
TheShadow
  • 69
  • 9
  • A similar question was asked the other week, I think it came down to using `ifstream#rdbuf`, I'll see if I can locate the question. – Jonny Henly Dec 15 '15 at 14:54
  • I have seen similar questions that get the file as a char array. But how should I read the variables from this array? – TheShadow Dec 15 '15 at 15:22
  • Is this vector size fixed, or does it vary for each vector? – Marko Popovic Dec 15 '15 at 15:31
  • People, please stop doing stuff like this: "osfile.write(reinterpret_cast (&vec[0]), vec.size()*sizeof(float));" Use a proper serialization paradigm, don't write blobs. Blobs are bugprone and not endian-safe. – KarenRei Dec 15 '15 at 15:39
  • The vector is fixed size and I know the size of the vector because the first thing saved in the file is an int which is the size of the vector. Is there a problem even if I have such a specific file? – TheShadow Dec 15 '15 at 16:10
  • @TheShadow Was the answer helpful to you? I mean, you wanted to know how to load entire file into memory at once and then parse the desired values ... – Marko Popovic Dec 25 '15 at 10:46

2 Answers2

0

I guess something is missing in your write procedure because the size of the vector is missing in your write stream ...

size_t size = vec.size();
osfile.write(reinterpret_cast<const char*> (&size), sizeof(size_t));
osfile.write(reinterpret_cast<const char*> (&vec[0]), vec.size()*sizeof(float));

osfile.write(reinterpret_cast<const char*> (&i), sizeof(unsigned int));
osfile.write(reinterpret_cast<const char*> (&i), sizeof(unsigned int));

Then you can load the global file buffer into memory: Read whole ASCII file into C++ std::string

Then, pass the loaded buffer to a istringstream iss; object

Then, read your stream the same way you wrote it (stream approach) :

float tmp;
size_t size_of_vector;
// read size of vector
iss >> size_of_vector;
// allocate once
vector<float> vec(size_of_vector);
// read content
while(size_of_vector--)
{
    iss >> tmp;
    vec.push_back(tmp);
}
// at the end, read your pair of int
unsigned int i1,i2;
iss >> i1;
iss >> i2;

EDIT : You still need to take care of binary vs. chars consideration when opening/reading streams ...

Community
  • 1
  • 1
norisknofun
  • 807
  • 1
  • 8
  • 22
  • Writing binary and reading text will probably not work very well. – molbdnilo Dec 15 '15 at 15:45
  • of course, that's why my edit. This will depend also of encoding, indianess, etc. Well, the question is about speed also. About that : since streams are not thread safe, this is the only valid lead. – norisknofun Dec 15 '15 at 16:08
  • Thanks for the answer. The size of the vector is fixed and is saved on the file as an int. Using this int I allocate the space needed for the vector. Won't loading the file as a string and then to number be slow? I was wondering if I could use the binary file without conversion – TheShadow Dec 15 '15 at 16:14
  • The problem is that your suggested reading code doesn't work with your suggested writing code at all. – molbdnilo Dec 15 '15 at 16:15
0

Here is the approach I would suggest. First, read a whole file into a buffer:

std::ifstream binFile("your_binary_file", std::ifstream::binary);
if(binFile) {
    // get length of file
    binFile.seekg(0, binFile.end);
    size_t length = static_cast<size_t>(binFile.tellg());
    binFile.seekg(0, binFile.beg);

    // read whole contents of the file to a buffer at once
    char *buffer = new char[length];
    binFile.read(buffer, length);
    binFile.close();

    ...

Then, extract vector and integers using this approach:

    size_t offset = 0;
    int vectorSize = *reinterpret_cast<int*>(buffer);
    offset += sizeof(int);

    float *vectorData = reinterpret_cast<float*>(buffer + offset);
    std::vector<float> floats(vectorSize);
    std::copy(vectorData, vectorData + vectorSize, floats.begin());
    offset += sizeof(float) * vectorSize;

    int i1 = *reinterpret_cast<int*>(buffer + offset);
    offset += sizeof(int);
    int i2 = *reinterpret_cast<int*>(buffer + offset);

Finally, when all data is read, don't forget to delete the memory allocated for the buffer:

    delete[] buffer;
}
Marko Popovic
  • 3,428
  • 2
  • 17
  • 34
  • Thanks for the answer. I currently use Cereal and serialize the object. Do you think your approach would be faster? I don't get that better execution times with serialization from ifstream. – TheShadow Dec 17 '15 at 16:42
  • @TheShadow Yes, I think it would be faster if you are doing a lot of small loads. – Marko Popovic Dec 17 '15 at 22:27