0

I am reading a file header using ifstream. Edit: I was asked to put the full minimal program, so here it is.

#include <iostream>
#include <fstream>


using namespace std;

#pragma pack(push,2)

struct Header
{
    char label[20];
    char st[11];
    char co[7];
    char plusXExtends[9];
    char minusXExtends[9];
    char plusYExtends[9];
};

#pragma pack(pop)

int main(int argc,char* argv[])
{
    string fileName;
    fileName = "test";

    string fileInName = fileName + ".dst";

    ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);
    if (!fileIn)
    {
       cout << "File Not Found" << endl;
       return 0;
    }

    Header h={};
    if (fileIn.is_open()) {
                cout << "\n" << endl;
                fileIn.read(reinterpret_cast<char *>(&h.label), sizeof(h.label));
                cout << "Label: " << h.label << endl;
                fileIn.read(reinterpret_cast<char *>(&h.st), sizeof(h.st));
                cout << "Stitches: " << h.st << endl;
                fileIn.read(reinterpret_cast<char *>(&h.co), sizeof(h.co));
                cout << "Colour Count: "  << h.co << endl;
                fileIn.read(reinterpret_cast<char *>(&h.plusXExtends),sizeof(h.plusXExtends));
                cout << "Extends: "  << h.plusXExtends << endl;
                fileIn.read(reinterpret_cast<char *>(&h.minusXExtends),sizeof(h.minusXExtends));
                cout << "Extends: "  << h.minusXExtends << endl;
                fileIn.read(reinterpret_cast<char *>(&h.plusYExtends),sizeof(h.plusYExtends));
                cout << "Extends: "  << h.plusYExtends << endl;

// This will output corrupted
                cout << endl << endl;
                cout << "Label: " << h.label << endl;
                cout << "Stitches: " << h.st << endl;
                cout << "Colour Count: "  << h.co << endl;
                cout << "Extends: "  << h.plusXExtends << endl;
                cout << "Extends: "  << h.minusXExtends << endl;
                cout << "Extends: "  << h.plusYExtends << endl;
    }


    fileIn.close();

    cout << "\n";
    //cin.get();
    return 0;
}


ifstream fileIn(fileInName.c_str(), ios_base::binary|ios_base::in);

Then I use a struct to store the header items

The actual struct is longer than this. I shortened it because I didn't need the whole struct for the question. Anyway as I read the struct I do a cout to see what I am getting. This part is fine.

As expected my cout shows the Label, Stitches, Colour Count no problem. The problem is that if I want to do another cout after it has read the header I am getting corruption in the output. For instance if I put the following lines right after the above code eg

Instead of seeing Label, Stitches and Colour Count I get strange symbols, and corrupt output. Sometimes you can see the output of the h.label, with some corruption, but the labels are Stitches are written over. Sometimes with strange symbols, but sometimes with text from the previous cout. I think either the data in the struct is getting corrupted, or the cout output is getting corrupted, and I don't know why. The longer the header the more the problem becomes apparent. I would really like to do all the couts at the end of the header, but if I do that I see a big mess instead of what should be outputting.

My question is why is my cout becoming corrupted?

Corruption

Community
  • 1
  • 1
Thomas Williams
  • 1,306
  • 1
  • 11
  • 30
  • 1
    This is wrong: `&h.label`. Better: `h.label`. Or equivalently, `&h.label[0]`. Make your changes, and see if that helps. Read this for more details: [What is Array Decaying?](https://stackoverflow.com/questions/1461432/what-is-array-decaying) – paulsm4 Jul 10 '19 at 23:29
  • so your second set of couts are immediately after the first ones (where you read the data)? – Cruz Jean Jul 10 '19 at 23:42
  • 1
    We need a [mcve], which is a full program and the necessary input files that we can use to reproduce the problem without having to fill in anything. You can and should omit parts of the code not relevant to the problem, but the result should still be a complete program and it should still reproduce the problem. Currently, it appears that your problem is in some parts of your code which you didn't show. – BessieTheCookie Jul 10 '19 at 23:42
  • @paulsm4 In this specific context, &h.label, h.label, and &h are all the same memory location, so that isn't the problem. I agree though, the additional & isn't a particularly good style. Personally I'd just read the entire header in one go: fileIn.read(reinterpret_cast(&h), sizeof(h)); – robthebloke Jul 11 '19 at 00:11
  • You are only showing half of the story. How were those items saved? You need to save the items first before you can read them. – PaulMcKenzie Jul 11 '19 at 00:15
  • I have now added my full, code. I can't add the test.dst though because that is a binary file and there is no way to add binary files here. & is a memory location. The file is a DST file which is a dakoma file. The items I am getting in are fine, and I am using them. It is just the cout which is corrupted, but only if I cout twice, or after the header – Thomas Williams Jul 11 '19 at 00:18
  • If it's a binary file, why aren't you reading the entire struct in one read? Why are you breaking up the reads into several lines? In addition to that, I see no null-termination handling at all in your code for those char arrays. – PaulMcKenzie Jul 11 '19 at 00:20
  • Because I followed a tutorial and this is how it was done. I didn't know you could read it all in one go – Thomas Williams Jul 11 '19 at 00:22
  • `cout << "Label: " << h.label` means to read a null-terminated string from the location of `h.label` , but your code never placed such a string there – M.M Jul 11 '19 at 00:25
  • But the fileIn is reading it into that label, or am I wrong. The first set of couts show correctly though. It is only when I have to output the second lot of couts that it shows corruption. Actually if I don't output the first lot the second lot are still corrupted though – Thomas Williams Jul 11 '19 at 00:27
  • @ThomasWilliams `fileIn.read(reinterpret_cast(&h), sizeof(h));` -- This will work, but note that this will only will work for POD types (which your struct is), and that your struct's members line up with the data in the file (i.e. probably needs to be byte packed). – PaulMcKenzie Jul 11 '19 at 00:27
  • `cout` takes some characters for the sake of formatting, such a choosing hex format for an integer to follow, or switching back to decimal. Have you checked your string to ensure it has none being interpreted? (I don't know details of how that works) – donjuedo Jul 11 '19 at 00:28
  • The data works fine. I am using this to build an svg and everything works as it should.The only thing that fails is the cout. – Thomas Williams Jul 11 '19 at 00:28
  • 2
    Are your strings null-terminated? Again, your code shows nothing in terms of ensuring this. That's probably why `cout` is going haywire. If you give `cout` a character array, it is assuming it is a null-terminated string. – PaulMcKenzie Jul 11 '19 at 00:30
  • HI Paul, they are just a header in a binary file. I doubt they are null terminated. However as you can see from my picture, the first lot of couts output correclty, the second lot do not. What can I do to solve this null termination? – Thomas Williams Jul 11 '19 at 00:31
  • If they're not null terminated, then you need to come up with a way to determine where the actual end of the string you read in is. The `sizeof` will not give you this information -- all it does is return the number of bytes your `char` array consists of. So you need to inspect your string (not just the displayable characters, all the characters using your debugger), and come up with a way to determine the real length of the string. Probably a loop until your char array value at a certain index equals something, and then that's where you stick the null. – PaulMcKenzie Jul 11 '19 at 00:36
  • `fileIn.read()` is a basic problem here. You are telling it to use each entire field (fixed size), without knowing the variable size of data that will come in. – donjuedo Jul 11 '19 at 00:37
  • @ThomasWilliams -- In the file, the character array positions that are not filled -- are these space characters? If so, then possibly `cout.write(h.label, sizeof(label));` and similar would work. But only if the excess positions in the array are spaces. – PaulMcKenzie Jul 11 '19 at 00:42
  • @donjuedo `cout`, or more correctly, the stream insertion operator (`< – BessieTheCookie Jul 11 '19 at 01:20
  • @AlexanderZhang, Here is detailed description of what I was thinking of: https://en.cppreference.com/w/cpp/io/manip But looking more closely at the OP, this idea does seem off track. – donjuedo Jul 11 '19 at 12:05
  • @donjuedo Those are separate types which aren't part of any inserted string. – BessieTheCookie Jul 11 '19 at 18:18
  • @AlexanderZhang, Ah, right. I see what you mean now. – donjuedo Jul 11 '19 at 18:57

1 Answers1

2

Using arrays to store strings is dangerous because if you allocate 20 characters to store the label and the label happens to be 20 characters long, then there is no room to store a NUL (0) terminating character. Once the bytes are stored in the array there's nothing to tell functions that are expecting null-terminated strings (like cout) where the end of the string is.

Your label has 20 chars. That's enough to store the first 20 letters of the alphabet: ABCDEFGHIJKLMNOPQRST

But this is not a null-terminated string. This is just an array of characters. In fact, in memory, the byte right after the T will be the first byte of the next field, which happens to be your 11-character st array. Let's say those 11 characters are: abcdefghijk.

Now the bytes in memory look like this: ABCDEFGHIJKLMNOPQRSTabcdefghijk

There's no way to tell where label ends and st begins. When you pass a pointer to the first byte of the array that is intended to be interpreted as a null-terminated string by convention, the implementation will happily start scanning until it finds a null terminating character (0). Which, on subsequent reuses of the structure, it may not! There's a serious risk of overrunning the buffer (reading past the end of the buffer), and potentially even the end of your virtual memory block, ultimately causing an access violation / segmentation fault.

When your program first ran, the memory of the header structure was all zeros (because you initialized with {}) and so after reading the label field from disk, the bytes after the T were already zero, so your first cout worked correctly. There happened to be a terminating null character at st[0]. You then overwrite this when you read the st field from disk. When you come back to output label again, the terminator is gone, and some characters of st will get interpreted as belonging to the string.

To fix the problem you probably want to use a different, more practical data structure to store your strings that allows for convenient string functions. And use your raw header structure just to represent the file format.

You can still read the data from disk into memory using fixed sized buffers, this is just for staging purposes (to get it into memory) but then store the data into a different structure that uses std::string variables for convenience and later use by your program.

For this you'll want these two structures:

#pragma pack(push,2)
struct RawHeader  // only for file IO
{
    char label[20];
    char st[11];
    char co[7];
    char plusXExtends[9];
    char minusXExtends[9];
    char plusYExtends[9];
};
#pragma pack(pop)

struct Header  // A much more practical Header struct than the raw one
{
    std::string label;
    std::string st;
    std::string co;
    std::string plusXExtends;
    std::string minusXExtends;
    std::string plusYExtends;
};

After you read the first structure, you'll transfer the fields by assigning the variables. Here's a helper function to do it.

#include <string>
#include <string.h>

template <int n> std::string arrayToString(const char(&raw)[n]) {
    return std::string(raw, strnlen_s(raw, n));
}

In your function:

Header h;
RawHeader raw;

fileIn.read((char*)&raw, sizeof(raw));
// Now marshal all the fields from the raw header over to the practical header.
h.label         = arrayToString(raw.label);
h.st            = arrayToString(raw.st);
h.st            = arrayToString(raw.st);
h.co            = arrayToString(raw.co);
h.plusXExtends  = arrayToString(raw.plusXExtends);
h.minusXExtends = arrayToString(raw.minusXExtends);
h.plusYExtends  = arrayToString(raw.plusYExtends);

It's worth mentioning that you also have the option of keeping the raw structure around and not copying your raw char arrays to std::strings when you read the file. But you must then be certain that when you want to use the data, you always to compute and pass lengths of the strings to functions that will deal with those buffers as string data. (Similar to what my arrayToString helper does anyway.)

Wyck
  • 5,985
  • 4
  • 34
  • 44
  • I was going to mention more about how the `pragma pack 2` _could_ cause the odd-length buffers to possibly have a terminating null byte, or junk, by virtue of padding to 2-byte alignment. But the answer is long winded as is and the behaviour is implementation specific. – Wyck Jul 11 '19 at 04:04
  • It looks like you have answered the question and I totally understand what you mean, but I am having a small problem with the line fileIn.read(&raw, sizeof(raw)) as I am getting an error //no matching function for call to std::basic_ifstream::read(RawHeader*, unsigned int). I am going to have my tea and have another look later. I may have typed something in wrong. So it looks like that one line would replace all my fileIn.read lines in one go. Once I get it to work that will be excellent. Will get back to you in a couple of hours once I have tried a few things. – Thomas Williams Jul 11 '19 at 20:39
  • I didn't have time to get back straight away. I worked out that it needed char, but I am still a little stuck as I am compiling in codeblocks and strnlen_s only works in MSVC. I am going to tick you anyway because your answer is correct. I just need an alternative to the strnlen_s. I have tried std::string test(raw.label, sizeof(raw.label)); and that works to a point, just need to make it into a function now and it should be ok. – Thomas Williams Jul 12 '19 at 20:17
  • Ok I found a way to do it slightly different but it works h.label = std::string(raw.label, sizeof(raw.label));. Tried to put it in a function but not quite working like that, but it works straight. – Thomas Williams Jul 12 '19 at 20:34
  • @ThomasWilliams `h.label = std::string(raw.label, sizeof(raw.label));` does **not** work correctly if there is a null terminator in the string. `{'a', 'b', 'c', '\0', 'x', 'x', 'x' }` will get converted to a string with length 7, but that some code (including the visual studio debugger) will display as `abc` and other code (including cout) will display as `abc xxx`. I strongly suggest using an approach that does not embed trailing nulls or spaces or junk. – Wyck Jul 13 '19 at 00:35
  • Yeh I agree. I am looking for an alternative to using strnlen_s, but so far haven't found one. I have never used templates before either. I usually write functions. A template looks like a single class, but never came across them in c++ before, but then the language changes so much over the years. – Thomas Williams Jul 13 '19 at 12:55