0

I would like to read data from a application/octet-stream charset=binary file with fread on linux and convert it to UTF-8 encoding. I tried with iconv, but it doesn't support binary charset. I haven't found any solution yet. Can anyone help me with it?

Thanks.

Somnath Musib
  • 2,850
  • 2
  • 24
  • 40
zuubs
  • 149
  • 3
  • 16
  • Is the text in the file already in UTF-8 encoding? Then there's no need to do any kind of conversion. Also, if you're programming in C++, why use the old C functions? – Some programmer dude Sep 02 '14 at 13:55
  • It's encoded in "application/octet-stream charset=binary" (file -bi file). The old C function is used, because the reading part is done with sources written in C, the characters will be given afterwards to a C++ function which does the rest. In the C++ side, iconv is used for converting to UTF-8. – zuubs Sep 02 '14 at 14:03
  • What exactly is meant by "binary" encoding? There is no standard "binary" encoding for textual data. Or rather, as all files and data ultimately are stored as binary ones and zeroes, *all* encodings can be considered to be "binary". Are you sure that the received data is actually a text file? Then you need to know the actual encoding. – Some programmer dude Sep 02 '14 at 14:08
  • According to the MIME printed, it doesn't seem to be textual. These are the characters that I can see with the "gedit" "001007r¢vÃUÍÿqKWAÆñ}ýtdG÷R]". – zuubs Sep 02 '14 at 14:18
  • if it is not textual, the you cannot expect to convert it to UTF-8. Converting encodings only makes sense under textual environments. – MariusSiuram Sep 02 '14 at 14:25

1 Answers1

2

According to the MIME that you've given, you're reading data that's in non-textual binary format. You cannot convert it with iconv or similar, because it's meant for converting text from one (textual) encoding to another. If your data is not textual, then a conversion to any character encoding is meaningless and will just corrupt the data, but not make it any more readable.

The typical way to present binary as readable text for inspection is hex dump. There's an existing answer for implementing it in c++: https://stackoverflow.com/a/16804835/2079303

Community
  • 1
  • 1
eerorika
  • 181,943
  • 10
  • 144
  • 256