3

I've used the code below to read a binary file (in my case .docx file) and storing it in unsigned char array instead of just char (took reference from here Reading and writing binary file)

#include <fstream>
#include <iterator>
#include <vector>

int main()
{
    std::ifstream input("C:\\test.docx", std::ios::binary);
    std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)), 
                                      (std::istreambuf_iterator<unsigned char>()));
}

Now I got two questions.

First thing I wanna know, is this a correct way to read a .docx file in an unsigned char array? Or are there better options available?

Secondly, I need to print the contents of file that are read in the unsigned char array, just to verify if it has correctly read the file or not. How can that be achieved?

Community
  • 1
  • 1
DD25
  • 83
  • 1
  • 7

1 Answers1

1

That is an OK way if you're fine having the whole file in memory. If you want to read the file in parts, you should iterate over it. A use-case for it would be for transmitting it over the network - there, you won't need the whole file in memory.

About printing the file, it's possible to print the bytes read, for example, like this:

#include <fstream>
#include <iterator>
#include <vector>
#include <iostream>
#include <iomanip>

int main()
{
    std::ifstream input("C:\\test.docx", std::ios::binary);
    std::vector<unsigned char> buffer((std::istreambuf_iterator<unsigned char>(input)), 
                                      (std::istreambuf_iterator<unsigned char>()));

    std::cout << std::hex;
    for (unsigned char b : buffer)
        std::cout << "0x" << std::setfill('0') << std::setw(2) << (int)b << " "; 
    std::cout << std::dec << std::endl;
}

If you meant printing the contents of the file to see some familiar text, that's not going to work directly. docx files use the Open XML File Format, which first of all, makes them a zip file. Inside the zip file, you will find XML representations of the data in the document, which are readable.

kobigurk
  • 730
  • 5
  • 14
  • So how can I make sure that the file has been read and stored in unsigned char array successfully ? – DD25 Oct 09 '16 at 11:21
  • 1
    You could use the method I wrote in the answer and additionally use a hex editor, such as "010 editor" for windows or "hd" for linux, to see that some parts match. You could also write the file back to disk under another name and do a diff between those. – kobigurk Oct 09 '16 at 11:24
  • @DD25: open the original with a good hex editor and compare to the output screen. Why would you doubt that this straightforward code does *not* correctly read the file? – Jongware Oct 09 '16 at 11:24
  • @kobigurk: the fun thing (counter to what I say above) is that *sometimes* my *reading* code has a tiny bug; say, a unsigned vs. signed mixup, or a linked list that goes astray - and Development Is Halted until I finally go back all the way to "let's see what *actually* gets stored in memory ..." – Jongware Oct 09 '16 at 11:27
  • @kobigurk I will do a diff between original and file written back to disk. However about the part where you mentioned that I could read the file in parts, and iterate over it. Can you tell how to achieve that? – DD25 Oct 09 '16 at 11:50