1

This is a homework assignment, just for all that want to know.

I'm writing a vocabulary translator (english -> german and vice versa) and am supposed to save everything the user does to file. Simple enough.

This is the code:

std::string file_name(user_name + ".reg");
std::ifstream file(file_name.c_str(), std::ios::binary | std::ios::ate);
// At this point, we have already verified the file exists. This shouldn't ever throw!
// Possible scenario:  user deletes file between calls.
assert( file.is_open() );

// Get the length of the file and reset the seek.
size_t length = file.tellg();
file.seekg(0, std::ios::beg);

// Create and write to the buffer.
char *buffer = new char[length];
file.read(buffer, length);
file.close();

// Find the last comma, after which comes the current dictionary.
std::string strBuffer = buffer;
size_t position = strBuffer.find_last_of(',') + 1;
curr_dict_ = strBuffer.substr(position);

// Start the trainer; import the dictionary.
trainer_.reset( new Trainer(curr_dict_.c_str()) );

The problem is, apparently, the curr_dict_ which is supposed to store my dictionary value. For example, my teacher has one dictionary file named 10WS_PG2_P4_de_en_gefuehle.txt. The Trainer imports the entire contents of the dictionary file like so:

std::string s_word_de;
std::string s_word_en;
std::string s_discard;
std::string s_count;
int i_word;

std::ifstream in(dictionaryDescriptor);

if( in.is_open() )
{
    getline(in, s_discard); // Discard first line.
    while( in >> i_word &&
        getline(in, s_word_de, '<') &&
        getline(in, s_discard, '>') &&
        getline(in, s_word_en, '(') &&
        getline(in, s_count, ')') )
    {   
        dict_.push_back(NumPair(s_word_de.c_str(), s_word_en.c_str(), Utility::lexical_cast<int, std::string>(s_count)));
    }
}
else
    std::cout << dictionaryDescriptor;

And a single line is written like so

1             überglücklich <-> blissful                     (0) 

The curr_dict_ seems to import fine, but when outputting it I get a whole bunch of garbage characters at the end of the file!

I even used a hex editor to make sure that my file containing the dictionary didn't contain excess characters at the end. It didn't.

The registry file the top code is reading for the dictionary:

Christian.reg

Christian,abc123,10WS_PG2_P4_de_en_gefuehle.txt

What am I doing wrong?

IAE
  • 2,093
  • 11
  • 35
  • 66
  • 2
    This statment leaks: `char *buffer = new char[length];` prefer `std::vector buffer(length);` – Martin York Dec 07 '10 at 17:32
  • 2
    Or better yet, don’t read into a char buffer, [read directly into a string](http://stackoverflow.com/questions/116038/what-is-the-best-way-to-slurp-a-file-into-a-stdstring-in-c/116220#116220). (As a bonus, that would also have prevented this particular bug …) – Konrad Rudolph Dec 07 '10 at 17:36
  • @Martin: Thanks, and fixed. @Rudolph: Looks pretty sweet; I'll try to integrate that. – IAE Dec 07 '10 at 17:37

2 Answers2

3

the read function (as in the line file.read(buffer, length);) does not nul-terminate the character buffer. You'll need to manually do it (allocate one more character, and place the nul at the gcountth position after reading).

lijie
  • 4,621
  • 19
  • 25
  • So I actually haven't used gcount yet, but from what cplusplus tells me, it simply returns the position of the last character read. Sweet! How do I use this to read the null terminator to a char *? – IAE Dec 07 '10 at 17:42
  • 1
    you'll need to put it in manually (i.e. something like `buffer[file.gcount()] = 0`. – lijie Dec 07 '10 at 18:31
  • 1
    which is also why you'll need to allocate one extra character (when using the character buffer or the vector approach). – lijie Dec 07 '10 at 18:36
1

I would do this:

std::string strBuffer(length, '\0');
myread(file, &strBuffer[read], length); // guranteed to read length bytes from file into buffer

Avoid the need for an intermediate buffer completely.

Martin York
  • 234,851
  • 74
  • 306
  • 532
  • -1: relying on internals of the std::string implementation. Any std::string implemented by non-contiguous storage would fail. See http://stackoverflow.com/questions/760790/is-it-legal-to-write-to-stdstring – Zan Lynx Dec 08 '10 at 02:08
  • @Zan Lynx: I beg to differ. 1) C++03 does require that &str[0] return a pointer to [contiguous storage](http://herbsutter.wordpress.com/2008/04/07/cringe-not-vectors-are-guaranteed-to-be-contiguous/#comment-483) 2) The string length will not be affected by the read (as string maintains the length separately from the string data (i.e. it is not reliant on the string being '\0' terminated)). The reason is that data() c_str() and operator[] are done like this is to allow (but not require) the implementation to provide a reference counted version of string. So please remove your erroneous -1. – Martin York Dec 08 '10 at 13:24
  • Very well, the operator[] will return contiguous storage. However, the length is still wrong. It will be set to the length of the file but there is no guarantee the read actually read that much data. – Zan Lynx Dec 08 '10 at 16:32
  • I assume that the user is aware that read may return before completion and thus require a loop to read the full file. That is a different problem. The example code is just that (as is the above) code which does not of error checking either and thus the original code provides the exact **same** functionality that the OP provided. But just to be pedantic I have added the required loop to actually guarantee that file is read into a buffer. – Martin York Dec 08 '10 at 17:04
  • Should also consider the case that the file contents were rewritten between getting the file length and reading the data. – Zan Lynx Dec 08 '10 at 21:27