WinAPI ReadFile returns corrupted data

Question

I'm writing a function in my Visual C++ project that reads contents of a file via WinAPI in 2000 byte increments and returns it as a std::string.

A problem occurs when the file is much larger than the buffer (for example 100 KB), I get garbage added at several locations in the file in the middle of valid data. This is a long 0xcccccccc... sequence terminated by 3-4 other bytes, usually appearing in the middle of a word. The function doesn't fail otherwise and none of the valid data is missing.

I haven't checked the exact positions but it seems that this happens at buffer size increments (or a multiplier of buffer size increments). If I increase the size of the buffer to more than the size of the test files, the problem goes away. What causes this to happen? What am I doing wrong?

std::string read_file(std::string filename) {
    HANDLE hFile = CreateFile(filename.c_str(), GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, NULL, NULL);
    if (hFile == INVALID_HANDLE_VALUE)
    {
        std::string errortext("Error opening " + filename + ", bad handle value: " + to_string((int)hFile));
        MessageBox(hwnd, errortext.c_str(), "Error", 0);
        return "";
    }
    char buffer[2000] = "";
    std::string entire_file = "";
    DWORD dwBytesRead = 0;

    while (ReadFile(hFile, buffer, sizeof(buffer), &dwBytesRead, NULL))
    {
        if (!dwBytesRead)
            break;
        entire_file += buffer;
    }
    CloseHandle(hFile);
    return entire_file;
}

Why are you using winapi native functions to read the file? What's wrong with `std::ifstream`? — πάντα ῥεῖ, Jun 30 '19 at 06:18
Also `entire_file += buffer;` looks problematic. You sohuld ensure that what's read from the buffer is nul terminated (e.g. use `memset()` before reading). — πάντα ῥεῖ, Jun 30 '19 at 06:20
`memset` won't do jack if the read fills the buffer completely. You'd also have to reserve an extra byte for the terminator or the bug'll be bach. — user4581301, Jun 30 '19 at 06:28
The loop seems pointless. Read the entire file in a single call to ReadFile. — David Heffernan, Jun 30 '19 at 07:24
Btw, 2048 is a better size than 2000. Also, on some (all?) compilers you're preventing RVO with two different return statements, you could move the `entire_file` variable declaration before the first `if` and do `return entire_file` instead of `return ""` — Mirko, Jun 30 '19 at 10:23
[What is the best way to read an entire file into a std::string in C++?](https://stackoverflow.com/questions/116038/) — Remy Lebeau, Jun 30 '19 at 23:43
Thanks for the tips. I'm new to C++ and appreciate the help. — Liz, Jul 01 '19 at 20:31

score 5 · Accepted Answer · answered Jun 30 '19 at 06:19

5

entire_file += buffer;

assumes that buffer is a nul terminated string which, in your case, it isn't.

Try this

entire_file.append(buffer, dwBytesRead);

This is a good example of code which should have been ringing alarm bells, because you didn't use the dwBytesRead variable (except to terminate the loop).

answered Jun 30 '19 at 06:19

john

71,156
4
49
68

Thanks, I've figured that was probably it. Initializing the buffer with calloc and reading one byte less than its size in the loop fixed the problem. – Liz Jul 01 '19 at 20:27
Nope. That doesn't fix the problem. You will fail to process the final block of text correctly that way. Again, I don't see the point of the loop. – David Heffernan Jul 01 '19 at 21:37
@DavidHeffernan How so? Code looks ok to me. Also don't see how to process without a loop unless you know the size of the file beforehand. If you have improvements or alternatives it would be nice to see some actual code. – john Jul 02 '19 at 06:30
Your code is OK. What Liz described in the comment above is not. Final concatenation will not be null terminated. "Unless you know the size of the file beforehand". We'll, that is known. So allocate the string to that length and read into its buffer. – David Heffernan Jul 02 '19 at 06:39

WinAPI ReadFile returns corrupted data

1 Answers1