1

Why C + +, create a Unicode file if you try to write a structure like this in the file? Part of the code:

struct stEjemplo
{
    char cadena1[9];
    char cadena2[9];
};

Write what I write in cadena1 and cadena2 shows me something like this in the file:

㈱㐳㘵㠷㠀㘷㐵㈳o

Example:

fstream file("File.dat");
if(!file.is_open())
{
    file.open("File.dat", ios::in | ios::out | ios::trunc);
}
stEjemplo somest = {0};
strcpy(somest.origen, "SomeText");
strcpy(somest.destino, "SomeText");
file.clear();
file.seekg(0,ios::beg); //ios::end if existing information
file.write(reinterpret_cast< char*>(&somest), sizeof(stEjemplo));
file.close();

Results this:

潓敭敔瑸匀浯呥硥t

Note the "t" in final (is the "t" in final of the second "SomeText")

But if my structure was:

struct stEjemplo
{
    int number; //then I assign 1324
    char cadena1[9];
    char cadena2[9];
};

Results: , SomeText SomeText or

struct stEjemplo
{
    bool x; //then I assign true o false
    char cadena1[9];
    char cadena2[9];
};

would result something like: SomeText SomeText

EDIT:

If the 00 (NULL character) in hex editor is set in odd position (starting at 0, for example: 1, 3, 5, 7, 9, etc etc) I have the problem, but if the 00 is set in a pair position and is not preceded by another 00, the problem is solved.

Skl
  • 13
  • 3
  • Why a reinterpret_cast? – rubenvb Jul 02 '14 at 05:52
  • Similar to (char *) in C. reinterpret_cast is a c++ style cast. – Skl Jul 02 '14 at 06:01
  • check for struct alignment and `#pragma pack` – pepper_chico Jul 02 '14 at 06:02
  • the compiler is free to align members as he wishes if you don't say otherwise. – pepper_chico Jul 02 '14 at 06:03
  • Could you give me an example? i try it but the results is the same. – Skl Jul 02 '14 at 06:14
  • This seems like a locale issue. Try to imbue classic locale for encoding facet. – ALittleDiff Jul 02 '14 at 06:24
  • If you use `ostream::write` to write a `struct` to a file, use binary mode `ios::binary` and use `istream::read` to read it back. The resulting file would be binary data, not Unicode and not ASCII (there is no such thing as "ANSI file"). It would make little sense to open it in a text editor. If you want a text file, open in text mode and write strings. – n. 'pronouns' m. Jul 02 '14 at 06:49
  • Funny thing, if I do the same in another part of the project, it works as it should, that's weird. And about ANSI and UNICODE files, I put that because in Notepad (windows 7 btw) appears: Encoding: ANSI, UNICODE, UNICODE big endian, UTF-8. I used binary files, but by chance use text file. Let me try to see how it goes. – Skl Jul 02 '14 at 06:57
  • On ANSI: http://stackoverflow.com/questions/701882/what-is-ansi-format – n. 'pronouns' m. Jul 02 '14 at 07:06
  • @user3793540 ah yes, you're passing the structure. I would dare to say this is undefined behaviour because there may be padding added to the struct which you are ignoring. – rubenvb Jul 02 '14 at 07:10

3 Answers3

3

You are opening File.dat in your text editor as UTF-16LE when it quite clearly isn't, open it in plain ASCII or UTF-8 (or even use a hex editor) and you should see the strings.

潓敭敔瑸匀浯呥硥t corresponds to the UTF-16LE sequence

53 6F 6D 65 54 65 78 74 00 53 6F 6D 65 54 65 78 74 00

guess what this is when read as plain ASCII / UTF-8?

user657267
  • 19,343
  • 5
  • 50
  • 73
  • @user657257 Yes, buy I don't know why GCC inserts a "ff fe" at the beginning of cadena1[] If I fill all the string! Is the first time that I have a problem in that way. – Skl Jul 02 '14 at 09:29
  • @Skl GCC isn't adding the BOM, your editor is (Notepad presumably?). Delete `File.dat`, run the app again, and read the file with a hex editor or a another text editor like Notepad++. – user657267 Jul 02 '14 at 09:33
  • Yes, I'm use Notepad (Windows 7). On the other hand, I deleted all of the code that must treat the contents of the structure and the file and I've rewritten differently, certainly more orderly, and erased the file. I ran the program and now it appears: 潓敭敔瑸匀浯呥硥t (in the notepad appear rectangles), (my structure is the same). I opened it with the hex editor, grouped by bytes, and appears without BOM. But if grouped by WORDS (this agroup two bytes), appears as in Notepad (rectangles). Then opened it in Notepad++ and appears: SometextNULSometextNUL I guess the problem is already solved. Thanks. – Skl Jul 03 '14 at 22:32
0

This is a bad way of handling things. It may even be undefined behavior (due to padding of the struct members).

It would be better to write serialization code for your struct:

#include <cstring>
#include <fstream>
#include <iostream>

struct stEjemplo
{
    char cadena1[9];
    char cadena2[9];
};

std::ostream& operator<<(std::ostream& os, const stEjemplo& e)
{
  return os << e.cadena1 << ' ' << e.cadena2;
}

int main()
{
  stEjemplo somest = {};
  std::strcpy(somest.cadena1, "SomeText");
  std::strcpy(somest.cadena2, "SomeText");
  std::ofstream file("File.dat", std::ios::trunc);
  if(!file)
  {
    std::cout << "Failed opening file.\n";
  }
  file << somest;
  file.close();

  // no error checking, assuming all will go well
  std::ifstream test("File.dat");
  std::string contents;
  std::getline(test, contents);
  std::cout << contents;
}

Live demo here.

Also:

  • consider using std::string instead of char[] and strcpy.
  • consider using std::ios::binary when writing raw data like encoded strings.
rubenvb
  • 69,525
  • 30
  • 173
  • 306
  • That's right, but I carry the information of the file to vector examples; with push_back (), work with the vector, delete or open the file with ios :: trunc, and then write the vector information in the file. Use this method with other structure and it works perfectly. – Skl Jul 02 '14 at 07:32
  • @user3793540 the fact that the method "works" elsewhere by no means means it is otherwise correct. – rubenvb Jul 02 '14 at 07:36
  • I opened the file in binary mode, use ifstream to read and ofstream to write the vector, and the same thing keeps happening. – Skl Jul 02 '14 at 08:01
0

Known Notepad bug. Not your fault.

MSalters
  • 159,923
  • 8
  • 140
  • 320
  • I think this is the problem in which the characters appeared to me that way. Thanks, I had lost a whole day thinking it was my mistake. – Skl Jul 03 '14 at 23:04