1

I am trying to write a program to compare two text files using UNIX API calls. Here are the contents of my two files:

f1.txt

This is my sample.
It contains text
And for some reason
The last few chars
are duplicated?

f2.txt

This is another sample
Sometimes instead of
duplicating the last few chars,
it prints another new line
instead
4567865

I have a cpp file that opens and reads these files. My OpenRead function takes a filename as a c string, and puts the contents of the text file into a string and returns it.

    #include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <iostream>
#include <string>
#include <cstring>



using namespace std;

string OpenRead(const char*);

int main(int argc, char **argv)
{
    string text1 = "", text2 = "";

    string file1(argv[1]);
    string file2(argv[2]);


    text1 = OpenRead(file1.c_str());
    text2 = OpenRead(file2.c_str());
    cout << text1 << endl;
    cout << text2 << endl;

    exit(EXIT_SUCCESS);


 return 0;
}



string OpenRead(const char* filename)
{
    int inFD1;
    string text;

    char * buf = new char[fsize(filename)];

    inFD1 = open(filename, O_RDONLY, 0);
    if(inFD1 < 0) exit(EXIT_FAILURE);
    else
    {
         while (read(inFD1, buf, sizeof(int)) != 0) 
            text += buf; //cout << buf;

    }   

    close(inFD1);
    delete [] buf;
    return text;
}

size_t fsize(const char *filename) {
    struct stat st; 

    if (stat(filename, &st) == 0)
        return st.st_size;

    return -1; 
}

The issue is that when I compile this into an executable and run my command: fileComp f1.txt f2.txt, it opens and reads them almost perfectly fine, but produces strange output where extra characters are appended to the end. Here is what the output looks like:

This is my sample.
It contains text
And for some reason
The last few chars
are duplicated?
e
This is another sample
Sometimes instead of
duplicating the last few chars,
it prints another new line
instead
4567865
8

For some reason it appends on an e to the first file and an 8 to the second. This behavior varies among text files, but it always appends random characters from the buffer to the end.

Christophe
  • 54,708
  • 5
  • 52
  • 107
  • It's C++ because `namespace` and `string` do not exist in C. I've removed the misleading tag. – Christophe Mar 01 '18 at 23:00
  • 4
    `text += buf;` expects `buf` to be a null-terminated string. `read()` doesn't add a null terminator. – Barmar Mar 01 '18 at 23:00
  • @Bamar thank you. how exactly do I fix this? –  Mar 01 '18 at 23:02
  • 2
    Your mix of C and C++ is just awful. Either stick to C or stick to C++, but don't mix them. You are wasting memory, why do you allocate `fsize(filename)`if you are only ready `sizeof(int)` blocks of memory? – Pablo Mar 01 '18 at 23:03
  • I also don't fully understsand these UNIX functions. I am trying to learn –  Mar 01 '18 at 23:07
  • @MiltonJames: the point Pablo was trying to make is that you are clearly writing code in C++, so you should be using C++ style file I/O instead of C style file I/O. There is no need to use those Unix functions at all when C++ has other options available. See [What is the best way to read an entire file into a std::string in C++?](https://stackoverflow.com/questions/116038/) – Remy Lebeau Mar 01 '18 at 23:29

4 Answers4

2

Usual error. The problem is in your code:

while (read(inFD1, buf, sizeof(int)) != 0) 
    text += buf; //cout << buf;

This should be:

int count;
while ((count = read(inFD1, buf, sizeof(int))) > 0) 
    text.append(buf, count); 

NB:

  • It doesn't make sense to declare buf with the length of the filename, or to pass anything less than the actual length of buf to the read() function.
  • You don't need to allocate buf dynamically. Just use char buf[4096]; like everybody else.
user207421
  • 289,834
  • 37
  • 266
  • 440
2

The main issue I see is that you don't check the result of read for the count of characters read.

    while (read(inFD1, buf, sizeof(int)) != 0) 

You ask for sizeof(int) bytes on each read. But it may return less than this. So you really should have that value.

Also when adding buffer to the result text you assume that the buffer is already zero'd out (and thus you get a default null terminator '\0')

        text += buf; //cout << buf;

Note the operator+= assumes that buf is a C-String and is thus null terminated. You don't provide that guarantee with your code.

int len;
while ((len = read(inFD1, buf, sizeof(int))) > 0) {
    text.append(buf, len); 
}

Once you get it working. Take it to https://codereview.stackexchange.com for a review on best practices.

Martin York
  • 234,851
  • 74
  • 306
  • 532
0

Your reading loop is broken:

     while (read(inFD1, buf, sizeof(int)) != 0) 
        text += buf; 

Cause of the problem

First, you must verify that read() returns a positive number, because it could as well return a negative error message, leaving buf in an unknown state.

Then you must get the number of bytes read (return value of read() ), to know where the buffer content ends, otherwise you could have remaining unitialized characters in the buffer. If there's no null terminator at the end, you might very well get UB.

Finally, I hope that your file is not binary, because any null that would be read would be interpreted as a string terminator, causing the input to be truncated.

Better approach

Note also that you reserve a buffer that could contain the full file, but read only a couple of bytes each time (sizeof(int) bytes exactly), and then copy them into a string. Why not use C++ streams and read content either with getline() for string convenience or with fstream::read(), reading larger chunks of data ?

Christophe
  • 54,708
  • 5
  • 52
  • 107
  • Yep just switched to binary files and now all I get is terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Abort –  Mar 01 '18 at 23:14
  • Yes, it may well be that it doesn't find a null terminator and so thinks it's a huge string... In the link behind fstream::read() you can find an example where the full file is loaded in memory with one read. – Christophe Mar 01 '18 at 23:24
0

The statement text += buf expects buf to be null terminated. However, your loop is not placing a null character after the data it reads from the file. At the very least, you would need to add this to your loop:

ssize_t len;
while ((len = read(inFD1, buf, sizeof(int))) > 0) 
{
    buf[len] = 0; // <-- add this
    text += buf;
}

Which BTW would break if buf is not allocted to at least sizeof(int) bytes in size, or the file data contains any null bytes in it.

You can avoid the null terminator issue by using the std::string::append() method instead:

ssize_t len;
while ((len = read(inFD1, buf, sizeof(int))) > 0) 
    text.append(buf, len);

That being said, since you are allocating buf to be the full size of the file, but your loop is only reading sizeof(int) number of bytes per iteration, you are wasting fsize(filename) - sizeof(int) number of bytes when fsize(filename) > sizeof(int) (and your code has undefined behavior when fsize(filename) < sizeof(int)).

You should use a fixed size for buf instead. There is no need to allocate more than you actually read on each loop iteration:

string OpenRead(const char* filename)
{
    int inFD1 = open(filename, O_RDONLY, 0);
    if (inFD1 < 0) exit(EXIT_FAILURE);

    string text;
    char buf[1024];
    ssize_t len;

    do {
        len = read(inFD1, buf, sizeof(buf));
        if (len == -1) {
            close(inFD1);
            exit(EXIT_FAILURE);
            break;
        }
        if (len == 0) break;
        text.append(buf, len);
    }
    while (true);

    close(inFD1);
    return text;
}

Or, simply pre-allocate text instead, and then read the file directly into text. That way, you don't need to allocating a temporary buffer at all:

string OpenRead(const char* filename)
{
    size_t size = fsize(filename);
    if (size < 0) exit(EXIT_FAILURE);

    string text;

    if (size > 0) {
        int inFD1 = open(filename, O_RDONLY, 0);
        if (inFD1 < 0) exit(EXIT_FAILURE);

        text.resize(size);

        char *buf = &text[0];
        ssize_t len;

        do {
            len = read(inFD1, buf, size);
            if (len <= 0) {
                close(inFD1);
                exit(EXIT_FAILURE);
                break;
            }
            buf += len;
            size -= len;
        }
        while (size > 0);    

        close(inFD1);
    }

    return text;
}

That being said, what you are doing is the C way of reading a file. You are writing C++ code, so you should use C++ style file I/O instead. There are many ways to read a file into a std::string. See What is the best way to read an entire file into a std::string in C++?.

Remy Lebeau
  • 454,445
  • 28
  • 366
  • 620