0

I am assigned a task where I have to explain why PrintStream and OutputDataStream produce two different kinds of output files (which I know - the first writes a string representation byte-by-byte, whilst the second writes the raw binary data). In order to elaborate on the background of this, I wanted to write a small C++ file to demonstrate reading the written data off the file back to stdout.

The idea is simple: Write short values from 20.000 to 32.000 to a file using OutputDataStream using it's writeShort(int) method. According to the Java documentation, those values are written in two bytes.

Now... I did try to implement this with std::ifstream on the C++ side, and I believe I ran into some endianess-related issues. According to what I have gathered from various SO questions, Java will write in "network format", which is apparently a different description for "Little Endian". But as far as I think I am aware of, my Mac (MacBook, mid. 2014), uses "Big Endian" - so the bytes are in a wrong order.

This is what I have come up with so far:

#include <iostream>
#include <fstream>

using namespace std;

int main(int argc, char** argv) {
  ifstream fh("./out.DataOutputStream.dat", ios::in|ios::binary);
  if(!fh.is_open()) {
    cerr << "Error while opening file." << endl;
    cerr << "Are you in the same directory as <out.DataOutputStream.dat>?" << endl;
    return 1;
  }

  cout << "--- Begin of data ---" << endl;
  char num1, num2;
  #define SWAP(b) ( (b >> 8) | (b << 8) )
  while(!fh.eof()) {
    fh.read(&num1, 1); // read one byte
    fh.read(&num2, 1); // read the next byte

    cout << (unsigned short)SWAP(num2) << (unsigned short)SWAP(num1);
  }
  cout << flush;
  cout << "--- End of data ---" << endl;

  return 0;
}

This result does print 32000 at the (very) end...but it prints that twice, and everything else is completely off... Any idea on how I can get this to work with the STL only?

Flimzy
  • 60,850
  • 13
  • 104
  • 147
Ingwie Phoenix
  • 1,977
  • 2
  • 19
  • 26
  • Network format is big endian. – Eljay Feb 05 '19 at 21:40
  • 1
    Unrelated: `while(!fh.eof())` [can be a real buzz-kill](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong). – user4581301 Feb 05 '19 at 21:41
  • @user4581301Oh wow - I didn't know that! I'll fix this by using `.read(...)` instead, once I know what I actually can read, or how to read the two bytes I need. – Ingwie Phoenix Feb 05 '19 at 21:51
  • @Eljay I see. But which endianess then applies to my Mac? Or is the x86 instruction set always "little endian"? – Ingwie Phoenix Feb 05 '19 at 21:52
  • Depends on your Mac. 68000 was always big endian, PowerPC was flippy - it could be big endian or little endian. Intel 80x86 is little endian. When the ARM based Mac comes out, I'm not sure. But I don't have to care... that's why we have `ntoh` and `hton`. – Eljay Feb 05 '19 at 21:56
  • @Eljay Ah, okay. I have an intel based Mac, which reports itself as `x86_64`. So I believe it is little endian, then. Thanks. :) – Ingwie Phoenix Feb 05 '19 at 21:59
  • It shouldn't matter what the host's endian actually is. The file is written with a specific endian (big endian). So you know the exact byte order and can read the bytes individually and recombine in the correct order regardless of endian, or simply read the whole 2-byte value and then use `ntohs()` to ensure it matches the host's endian. – Remy Lebeau Feb 05 '19 at 23:17

0 Answers0