-3

I'm writing a function in my Visual C++ project that reads contents of a file via WinAPI in 2000 byte increments and returns it as a std::string.

A problem occurs when the file is much larger than the buffer (for example 100 KB), I get garbage added at several locations in the file in the middle of valid data. This is a long 0xcccccccc... sequence terminated by 3-4 other bytes, usually appearing in the middle of a word. The function doesn't fail otherwise and none of the valid data is missing.

I haven't checked the exact positions but it seems that this happens at buffer size increments (or a multiplier of buffer size increments). If I increase the size of the buffer to more than the size of the test files, the problem goes away. What causes this to happen? What am I doing wrong?

std::string read_file(std::string filename) {
    HANDLE hFile = CreateFile(filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL);
    if (hFile == INVALID_HANDLE_VALUE)
    {
        std::string errortext("Error opening " + filename + ", bad handle value: " + to_string((int)hFile));
        MessageBox(hwnd, errortext.c_str(), "Error", 0);
        return "";
    }
    char buffer[2000] = "";
    std::string entire_file = "";
    DWORD dwBytesRead = 0;

    while (ReadFile(hFile, buffer, sizeof(buffer), &dwBytesRead, NULL))
    {
        if (!dwBytesRead)
            break;
        entire_file += buffer;
    }
    CloseHandle(hFile);
    return entire_file;
}
Liz
  • 3
  • 1

1 Answers1

5
entire_file += buffer;

assumes that buffer is a nul terminated string which, in your case, it isn't.

Try this

entire_file.append(buffer, dwBytesRead);

This is a good example of code which should have been ringing alarm bells, because you didn't use the dwBytesRead variable (except to terminate the loop).

john
  • 71,156
  • 4
  • 49
  • 68
  • Thanks, I've figured that was probably it. Initializing the buffer with calloc and reading one byte less than its size in the loop fixed the problem. – Liz Jul 01 '19 at 20:27
  • Nope. That doesn't fix the problem. You will fail to process the final block of text correctly that way. Again, I don't see the point of the loop. – David Heffernan Jul 01 '19 at 21:37
  • @DavidHeffernan How so? Code looks ok to me. Also don't see how to process without a loop unless you know the size of the file beforehand. If you have improvements or alternatives it would be nice to see some actual code. – john Jul 02 '19 at 06:30
  • Your code is OK. What Liz described in the comment above is not. Final concatenation will not be null terminated. "Unless you know the size of the file beforehand". We'll, that is known. So allocate the string to that length and read into its buffer. – David Heffernan Jul 02 '19 at 06:39