24

I have written the following code which writes 4000 bytes of 0s to a file test.txt. Then, I read the same file in chunks of 1000 bytes at a time.

FileOutputStream output = new FileOutputStream("test.txt");
ObjectOutputStream stream = new ObjectOutputStream(output);

byte[] bytes = new byte[4000];

stream.write(bytes);
stream.close();

FileInputStream input = new FileInputStream("test.txt");
ObjectInputStream s = new ObjectInputStream(input);


byte[] buffer = new byte[1000];
int read = s.read(buffer);

while (read > 0) {
    System.out.println("Read " + read);
    read = s.read(buffer);
}

s.close();

What I expect to happen is to read 1000 bytes four times.

Read 1000
Read 1000
Read 1000
Read 1000

However, what actually happens is that I seem to get "paused" (for a lack of a better word) every 1024 bytes.

Read 1000
Read 24
Read 1000
Read 24
Read 1000
Read 24
Read 928

If I try to read more than 1024 bytes, then I get capped at 1024 bytes. If I try to read less than 1024 bytes, I'm still required to pause at the 1024 byte mark.

Upon inspection of the output file test.txt in hexadecimal, I noticed that there is a sequence of 5 non-zero bytes 7A 00 00 04 00 1029 bytes apart, despite the fact that I have written only 0s to the file. Here is the output from my hex editor. (Would be too long to fit in question.)

So my question is : Why are these five bytes appearing in my file when I have written entirely 0s? Do these 5 bytes have something to do with the pause that occurs every 1024 bytes? Why is this necessary?

Zsw
  • 3,312
  • 3
  • 21
  • 41
  • 2
    `InputStream.read(byte[])` doesn't guarantee that it reads as much as possible. The "short reading" behavior you're describing is perfectly legal, even for file-based input. So if you want to read the entire buffer, use `DataInput.readFully(byte[])`. – Nayuki Nov 29 '15 at 06:29
  • @NayukiMinase I understand. However, could you please explain why it pauses at the 1024 byte mark? For example, it would make sense to me if I could only read at most 1024 bytes, and my result will be 1024, 1024, 1024, 928. But I am confused as to why it reads the first 1000 bytes just fine, but then it can only read the next 24 bytes before continuing again. This seems entirely arbitrary? Or is there a reason for this? – Zsw Nov 29 '15 at 06:33
  • I don't know. But I'm puzzled by the 5-byte sequence that you mentioned, because the Javadoc for neither ObjectInputStream nor ObjectOutputStream say anything about the formatting of byte arrays. – Nayuki Nov 29 '15 at 06:34
  • @NayukiMinase I copied the output from my hex editor to the following url: http://pastebin.com/psGtEjVr (Would be too long to fit in the question). You can see that the output is mostly `00`, but every 1029 bytes there is a `7A 00 00 04 00` added in. – Zsw Nov 29 '15 at 06:37
  • 2
    @NayukiMinase It's a Block Data mark, documented in the Object Serialization Specification. – user207421 Nov 29 '15 at 06:45

3 Answers3

18

The object streams use an internal 1024-byte buffer, and write primitive data in chunks of that size, in blocks of the stream headed by Block Data markers, which are, guess what, 0x7A followed by a 32-bit length word (or 0x77 followed by an 8-bit length word). So you can only ever read a maximum of 1024 bytes.

The real question here is why you're using object streams just to read and write bytes. Use buffered streams. Then the buffering is under your control, and incidentally there's zero space overhead, unlike the object streams which have stream headers and type codes.

NB serialized data is not text and shouldn't be stored in files named .txt.

user207421
  • 289,834
  • 37
  • 266
  • 440
  • 1
    Well, regarding why I'm using Object streams, it was more so because I was trying to serialize Objects as well as bytes within the same stream. Is it better practice to create two different stream and separate them? Wouldn't that cause conflict with the object stream headers? – Zsw Nov 29 '15 at 06:54
  • 1
    Of course it would. You should certainly use the same stream, and if there are objects in it you have no choice but to use object streams. – user207421 Nov 29 '15 at 07:12
8

ObjectOutputStream and ObjectInputStream are special streams used for serialization of objects.

But when you do stream.write(bytes); you are trying to use the ObjectOutputStream as a regular stream, for writing 4000 bytes, not for writing an array-of-bytes object. When data are written like this to an ObjectOutputStream they are handled specially.

From the documentation of ObjectOutputStream:

(emphasis mine.)

Primitive data, excluding serializable fields and externalizable data, is written to the ObjectOutputStream in block-data records. A block data record is composed of a header and data. The block data header consists of a marker and the number of bytes to follow the header. Consecutive primitive data writes are merged into one block-data record. The blocking factor used for a block-data record will be 1024 bytes. Each block-data record will be filled up to 1024 bytes, or be written whenever there is a termination of block-data mode.

I hope from this it is obvious why you are receiving this behaviour.

I would recommend that you either use BufferedOutputStream instead of ObjectOutputStream, or, if you really want to use ObjectOutputStream, then use writeObject() instead of write(). The corresponding applies to input.

Mike Nakis
  • 46,450
  • 8
  • 79
  • 117
4

I suggest you use a try-with-resources Statement to handle closing your resources, add buffering with BufferedInputStream and BufferedOutputStream, and then use writeObject and readObject to serialize your byte[]. Something like,

try (OutputStream output = new BufferedOutputStream(//
        new FileOutputStream("test.txt"), 8192); //
        ObjectOutputStream stream = new ObjectOutputStream(output)) {
    byte[] bytes = new byte[4000];

    stream.writeObject(bytes);
} catch (IOException ioe) {
    ioe.printStackTrace();
}

and then to read like

try (InputStream input = new BufferedInputStream(//
        new FileInputStream("test.txt"), 8192); //
        ObjectInputStream s = new ObjectInputStream(input)) {
    byte[] bytes = (byte[]) s.readObject();
} catch (IOException | ClassNotFoundException ioe) {
    ioe.printStackTrace();
}

If there are partial arrays involved, you'll need to add the length. You can use stream.writeInt(len); and int len = stream.readInt(); on the other side.

Elliott Frisch
  • 183,598
  • 16
  • 131
  • 226