Your input file is likely encoded in a multi-byte charset. It does not appear to be UTF-8, though, as —
is encoded in UTF-8 as bytes E2 80 94
, which is —
when interpreted in Latin-1, and ’
is encoded in UTF-8 as bytes E2 80 99
, which is ’
when interpreted in Latin-1. That is not what you are seeing in your output, though. But the symptom is similar. You are reversing the encoded char
s in the string
as-is, which will not work for a multi-byte encoding.
To properly reverse a multi-byte encoded string, you would have to know the encoding beforehand and walk through the string based on that encoding, extracting each whole sequence of encoded units and saving each whole unit as-is to the output, rather than reading and saving the individual char
s as-is. std::reverse()
can't help you with that, unless you use iterators that know how to read and write those whole units.
If you know the encoding beforehand, you may have better luck using std::wifstream
/std::wofstream
instead, where they are imbue()
'ed with a suitable std::locale
for the encoding. Then use std::wstring
instead of std::string
. However, on Windows at least, where std::wstring
uses UTF-16, you still have the issue of dealing with multi-unit sequences (though less frequently, unless you are dealing with Eastern Asian languages). So you may have to convert the decoded UTF-16 input to UTF-32 before doing the reversing (then you have to deal with multi-codepoint grapheme clusters), then convert the UTF-32 to UTF-16 before then saving it encoded to the output file.
Also, if you are going to handle the individual char
s as-is, to ensure the raw char
s are read and written correctly, you should open the files in binary mode, and use UNformatted input/output operations (ie, no operator>>
or operator<<
):
ifstream ifs(name, std::ios::binary);
if (!ifs) throw runtime_error("Couldn't open input file.");
ofstream ofs("output.txt", std::ios::binary);
if (!ofs) throw runtime_error("Couldn't open output file.");
// Note: there are easier ways to read a file into a std::string!
// See: https://stackoverflow.com/questions/116038/
string s;
for(char ch; ifs.get(ch);)
s.push_back(ch);
reverse(s.begin(), s.end());
for(char ch: s)
ofs.put(ch);
// alternatively:
// ofs.write(s.c_str(), s.size());