2

As per the API, these are the facts:

  • The seek(long bytePosition) method simply put, moves the pointer to the position specified with the bytePosition parameter.
  • When the bytePosition is greater than the file length, the file length does not change unless a byte is written at the (new) end.
  • If data is present in the length skipped over, such data is left untouched.

However, the situation I'm curious about is: When there is a file with no data (0 bytes) and I execute the following code:

file.seek(100000-1);
file.write(0);

All the 100,000 bytes are filled with 0 almost instantly. I can clock over 200GB in say, 10 ms.

But when I try to write 100000 bytes using other methods such as BufferedOutputStream the same process takes an almost infinitely longer time.

What is the reason for this difference in time? Is there a more efficient way to create a file of n bytes and fill it with 0s?

EDIT: If the data is not actually written, how is the file filled with data? Sample this code:

RandomAccessFile out=new RandomAccessFile("D:/out","rw");
out.seek(100000-1);
out.write(0);
out.close();

This is the output:

Output

Plus, If the file is huge enough I can no longer write to the disk due to lack of space.

SirVirgin
  • 123
  • 2
  • 10
  • My guess is that the file size is "noted", but the actual blocks are not written to disk. How long does the flush/close take? (see here http://stackoverflow.com/a/257849/540873) – Thomas Jungblut Feb 23 '17 at 17:03
  • My guess was the same, but I did open the file and check it. when I didn't write the last byte, it was empty with the resultant file size being 0 bytes. when I did write the last byte, every byte until the last was filled with 0 and the file size was as input. The entire process, with the close() operation included is the time specified in the question (Which is why I'm amazed!) – SirVirgin Feb 23 '17 at 17:07
  • what did you not understand when you read the source code for that method? Do did read the source before asking for someone else to read it and do you work for you didn't you? –  Feb 23 '17 at 17:16
  • @JarrodRoberson it is a native method. – Gray Feb 23 '17 at 17:20
  • How the OS handles this very dependent on which OS @RangaRajan. And as to _how_ the data is written this is explained in the answers below. It's all 0s which are handled differently if the file is sparse. – Gray Feb 23 '17 at 17:21
  • @JarrodRoberson If would point me to the source code I would be most grateful good sir. – SirVirgin Feb 23 '17 at 17:23
  • @Gray thanks, I'm following up on sparse files (From the answers posted) – SirVirgin Feb 23 '17 at 17:24

2 Answers2

7

When you write 100,000 bytes to a BufferedOutputStream, your program is explicitly accessing each byte of the file and writing a zero.

When you use a RandomAccessFile.seek() on a local file, you are indirectly using the C system call fseek(). How that gets handled depends on the operating system.

In most modern operating systems, sparse files are supported. This means that if you ask for an empty 100,000 byte file, 100,000 bytes of disk space are not actually used. When you write to byte 100,001, the OS still doesn't use 100,001 bytes of disk. It allocates a small amount of space for the block containing "real" data, and keeps track of the empty space separately.

When you read a sparse file, for example, by fseek()ing to byte 50,000, then reading, the OS can say "OK, I have not allocated disk space for byte 50,000 because I have noted that bytes 0 to 100,000 are empty. Therefore I can return 0 for this byte.". This is invisible to the caller.

This has the dual purpose of saving disk space, and improving speed. You have noticed the speed improvement.

More generally, fseek() goes directly to a position in a file, so it's O(1) rather than O(n). If you compare a file to an array, it's like doing x = arr[n] instead of for(i = 0; i<=n; i++) { x = arr[i]; }

This description, and that on Wikipedia, is probably sufficient to understand why seeking to byte 100,000 then writing is faster than writing 100,000 zeros. However you can read the Linux kernel source code to see how sparse files are implemented, you can read the RandomAccessFile source code in the JDK, and the JRE source code, to see how they interact. However, this is probably more detail than you need.

slim
  • 36,139
  • 10
  • 83
  • 117
  • This answer handles your edit as well @RangaRajan. The data is 0s which is handled by the sparse file. – Gray Feb 23 '17 at 17:23
  • Thanks for the answers – SirVirgin Feb 23 '17 at 17:27
  • @slim Would a sparse file overwrite deleted data as well? (data deleted but not yet overwritten) – SirVirgin Feb 23 '17 at 17:33
  • Upon reading this description I can understand that it doesn't but I experimented and found that I could not recover a file using Piriform's Recuva. (Windows 10) – SirVirgin Feb 23 '17 at 17:41
  • If you have another question, ask it as a new question - but you will have to explain what you mean by "overwrite deleted data" and "data deleted but not yet overwritten". Also it may belong on a different Stack Exchange site, depending on which OS you are asking about. – slim Feb 23 '17 at 18:25
2

Your operating system and filesystem support sparse files and when it's the case, seek is implemented to make use of this feature.

This is not really related to Java, it's just a feature of fseek and fwrite functions from C library, which are most likely the backend behind File implementation on the JRE you are using.

more info: https://en.wikipedia.org/wiki/Sparse_file

Is there a more efficient way to create a file of n bytes and fill it with 0s?

On operating systems that support it, you could truncate the file to the desired size instead of issuing a write call. However, this seems to be not available in Java APIs.

Display Name
  • 7,504
  • 2
  • 28
  • 62
  • would you be so kind as to elaborate? Plus HOW is the seek() method implemented? Is there a way I can see how inbuilt library methods are implemented? – SirVirgin Feb 23 '17 at 17:04
  • @RangaRajan this is suitable for posting as another question. – Display Name Feb 23 '17 at 17:22
  • Thanks for the answers – SirVirgin Feb 23 '17 at 17:29
  • @Gray Would a sparse file overwrite deleted data as well? (data deleted but not yet overwritten) – SirVirgin Feb 23 '17 at 17:33
  • A sparse file has to be sparse. It has to be created by seek-ing. If you write a 0 or any other byte to the file then it isn't sparse. seek-ing around a file doesn't delete anything. It just moves the file pointer around @RangaRajan. – Gray Feb 25 '17 at 05:28