16

I have a requirement to write records to a file where the data is written at a file location (i.e, seek position) depending on the value of a numeric key. For example, if the key is 100, I might write at position 400.

The records consist of the numeric key and a piece of data. The record won't be very large (a few bytes). However, there may be a lot of records (millions).

There are two possible scenarios:

  1. The keys are monotonically increasing. In this case, the best approach is to write using a DataOutputStream wrapping a BufferedOutputStream, setting the buffer size to some number (e.g. 64k) to maximize I/O throughput.

  2. The keys are increasing but with possible large gaps. In this case using an OutputStream would require zeros to be written in the gaps in the file. To avoid this, a RandomAccessFile would be better as it could seek over the gaps, saving space if it is possible to seek over an entire block. The drawback is that, as far as I know, RandomAccessFile doesn't buffer, so this method is going to be slow for sequential keys.

However, the likely situation is that the file is a bit of both. There are sequences of monotonically increasing keys. There are some keys with small gaps between and others with very large gaps.

What I am looking for is a solution that gives the best of both worlds. It might be that I switch between the two I/O modes if a gap between keys is detected. However, it would be better if there is a standard Java class that can do both of these things. I have seen FileImageOutputStream, but I am not sure how this works.

Note that I am not looking for code samples (although that would be helpful for demonstrating for complex solutions), just a general strategy. It would be good to know optimal sizes buffer sizes for sequential data and at what point (gap size) you need to switch from a sequential strategy to a random-access strategy.

EDIT:

For an answer to be accepted, I would like some assurance that the proposed solution handles both, not just that it might. This would require:

  • Confirmation that the sequential mode is buffered.
  • Confirmation that the random access mode leaves holes in the file.

Also, the solution needs to be memory efficient as there could be many of these files open simultaneously.

EDIT 2

The files could be on a NAS. This is not by design, but simply recognition that in an enterprise environment, this architecture is used a lot and the solution should probably handle it (perhaps not optimally) and not prevent its use. AFAIK, this should not affect a solution based on write() and lseek(), but might invalidate some more esoteric solutions. 

Timothy Zorn
  • 1,911
  • 2
  • 14
  • 15
rghome
  • 7,212
  • 8
  • 34
  • 53
  • Is the file size fixed? Or does it need to grow based on the key? I would simply use a `MappedByteBuffer` for the write operations.. If the file is too large or needs to grow, I would wrap this in a class which maps in "blocks" and then moves the block along as you are writing .. The algorithm for this is fairly straightforward.. Just pick a block size that makes sense for the data you are writing.. – Nim Jun 20 '17 at 08:28
  • The size of the file is not known ahead of time. The file could be on a network drive - I am not sure if this affects your solution – rghome Jun 20 '17 at 08:41
  • Have a look at `java.nio.channels`. You can do random access with a `FileChannel`, and write buffered data. – teppic Jun 20 '17 at 09:18
  • @rghome - it doesn't all you need to do is as you append data via your wrapper, "move" the mapped block along. You may need to grow the file before doing a new mapping if the next index to write is larger than the current file size. This should be a fairly straight forward thing to do given you know already everything is fixed size. – Nim Jun 20 '17 at 10:10
  • Have you empirical proof that RandomAccessFile is slow? Java might not buffer it, but I would expect the OS to do so. – slim Jul 25 '17 at 08:46
  • In tests I got 5x better performance using serial I/O than random access I/O. – rghome Mar 12 '19 at 08:10

5 Answers5

1

Edit/warning: there are potential gotchas with this solution, because it heavily uses MappedByteBuffer, and it's unclear how/when the corresponding resources are released. See this Q&A & JDK-4724038 : (fs) Add unmap method to MappedByteBuffer.

That being said, please also see the end of this post


I would do exactly what Nim suggested:

wrap this in a class which maps in "blocks" and then moves the block along as you are writing .. The algorithm for this is fairly straightforward.. Just pick a block size that makes sense for the data you are writing..

In fact, I did exactly that years ago and just dug up the code, it goes like this (stripped to the bare minimum for a demo, with a single method to write data):

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Path;

public class SlidingFileWriterThingy {

    private static final long WINDOW_SIZE = 8*1024*1024L;
    private final RandomAccessFile file;
    private final FileChannel channel;
    private MappedByteBuffer buffer;
    private long ioOffset;
    private long mapOffset;

    public SlidingFileWriterThingy(Path path) throws IOException {
        file = new RandomAccessFile(path.toFile(), "rw");
        channel = file.getChannel();
        remap(0);
    }

    public void close() throws IOException {
        file.close();
    }

    public void seek(long offset) {
        ioOffset = offset;
    }

    public void writeBytes(byte[] data) throws IOException {
        if (data.length > WINDOW_SIZE) {
            throw new IOException("Data chunk too big, length=" + data.length + ", max=" + WINDOW_SIZE);
        }
        boolean dataChunkWontFit = ioOffset < mapOffset || ioOffset + data.length > mapOffset + WINDOW_SIZE;
        if (dataChunkWontFit) {
            remap(ioOffset);
        }
        int offsetWithinBuffer = (int)(ioOffset - mapOffset);
        buffer.position(offsetWithinBuffer);
        buffer.put(data, 0, data.length);
    }

    private void remap(long offset) throws IOException {
        mapOffset = offset;
        buffer = channel.map(FileChannel.MapMode.READ_WRITE, mapOffset, WINDOW_SIZE);
    }

}

Here is a test snippet:

SlidingFileWriterThingy t = new SlidingFileWriterThingy(Paths.get("/tmp/hey.txt"));
t.writeBytes("Hello world\n".getBytes(StandardCharsets.UTF_8));
t.seek(1000);
t.writeBytes("Are we there yet?\n".getBytes(StandardCharsets.UTF_8));
t.seek(50_000_000);
t.writeBytes("No but seriously?\n".getBytes(StandardCharsets.UTF_8));

And what the output file looks like:

$ hexdump -C /tmp/hey.txt
00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a 00 00 00 00  |Hello world.....|
00000010  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000003e0  00 00 00 00 00 00 00 00  41 72 65 20 77 65 20 74  |........Are we t|
000003f0  68 65 72 65 20 79 65 74  3f 0a 00 00 00 00 00 00  |here yet?.......|
00000400  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
02faf080  4e 6f 20 62 75 74 20 73  65 72 69 6f 75 73 6c 79  |No but seriously|
02faf090  3f 0a 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |?...............|
02faf0a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
037af080

I hope I did not ruin everything by removing the unnecessary bits and renaming... At least the offset computation looks correct (0x3e0 + 8 = 1000, and 0x02faf080 = 50000000).

Number of blocks (left column) occupied by the file, and another non-sparse file of the same size:

$ head -c 58388608 /dev/zero > /tmp/not_sparse.txt
$ ls -ls /tmp/*.txt
    8 -rw-r--r-- 1 nug nug 58388608 Jul 19 00:50 /tmp/hey.txt
57024 -rw-r--r-- 1 nug nug 58388608 Jul 19 00:58 /tmp/not_sparse.txt

Number of blocks (and actual "sparseness") will depend on OS & filesystem, the above was on Debian Buster, ext4 -- Sparse files are not supported on HFS+ for macOS, and on Windows they require the program to do something specific I don't know enough about, but that does not seem easy or even doable from Java, not sure.

I don't have fresh numbers but at the time this "sliding-MappedByteBuffer technique" was very fast, and as you can see above, it does leave holes in the file.
You'll need to adapt WINDOW_SIZE to something that makes sense for you, add all the writeThingy methods you need, perhaps by wrapping writeBytes, whatever suits you. Also, in this state it will grow the file as needed, but by chunks of WINDOW_SIZE, which you might also need to adapt.

Unless there is a very good reason not to, it's probably best to keep it simple with this single mechanism, rather than maintaining a complex dual-mode system.


About the fragility and memory consumption, I've ran the stress-test below on Linux without any issue for an hour, on a machine with 800GB of RAM, and on another very modest VM with 1G of RAM. System looks perfectly healthy, java process does not use any significant amount of heap memory.

    String path = "/tmp/data.txt";
    SlidingFileWriterThingy w = new SlidingFileWriterThingy(Paths.get(path));
    final long MAX = 5_000_000_000L;
    while (true) {
        long offset = 0;
        while (offset < MAX) {
            offset += Math.pow(Math.random(), 4) * 100_000_000;
            if (offset > MAX/5 && offset < 2*MAX/5 || offset > 3*MAX/5 && offset < 4*MAX/5) {
                // Keep 2 big "empty" bands in the sparse file
                continue;
            }
            w.seek(offset);
            w.writeBytes(("---" + new Date() + "---").getBytes(StandardCharsets.UTF_8));
        }
        w.seek(0);
        System.out.println("---");
        Scanner output = new Scanner(new ProcessBuilder("sh", "-c", "ls -ls " + path + "; free")
                .redirectErrorStream(true).start().getInputStream());
        while (output.hasNextLine()) {
            System.out.println(output.nextLine());
        }
        Runtime r = Runtime.getRuntime();
        long memoryUsage = (100 * (r.totalMemory() - r.freeMemory())) / r.totalMemory();
        System.out.println("Mem usage: " + memoryUsage + "%");
        Thread.sleep(1000);
    }

So yes that's empirical, maybe it only works correctly on recent Linux systems, maybe it's just luck with that particular workload... but I'm starting to think it's a valid solution on some systems and workloads, it can be useful.

Hugues M.
  • 17,453
  • 5
  • 27
  • 56
  • This wil create a new mapped byte buffer every time you remap. There is no well-defined time at which these are released, so you are liable to run out of memory pretty quickly. – user207421 Jul 19 '17 at 00:09
  • It's true it relies on the garbage collector & probably OS mechanisms. It worked well enough for us with huge files on Linux, I'll check back SCM history and application usage, see if I find tricks or information about issues this can cause – Hugues M. Jul 19 '17 at 06:25
  • It is *not* true that it relies on the garbage collector. Read what I wrote. There is no well-defined time at which `MappedByteBuffers` can be garbage-collected. So they are more than liable *not* to be garbage-collected at all. Which causes memory exhaustion. This is a well-known issue with `MappedByteBuffers`. – user207421 Jul 19 '17 at 06:53
  • I was in the process of writing this when I got your comment ---- Wow there are more gotchas than I thought indeed, plenty discussed [here](https://stackoverflow.com/q/2972986/6730571). We did not use the mentioned "cleaner" tricks, but yes I see mentions `System.gc()` being needed in the main app that uses this sliding-thingy. Thanks for the warning. ---- I did read what you wrote. – Hugues M. Jul 19 '17 at 06:55
  • Thanks for your response. Memory is critical, as mentioned in the question. – rghome Jul 19 '17 at 09:10
  • 1
    That's off-heap memory though. I just ran that in a loop to seek to random position and write stuff, moving back to 0, looping again, etc. On Linux, with a 52GB file, on a JVM started with -Xmx128m, used heap cycles between 5MB and 30MB, like an idle JVM would do... Oh, and system cache memory is not filling up, I do see some cache usage in output of `free` command (and `htop`), but it does not fill up, system is healthy. But yeah I understand it could misbehave with different usage patterns, or maybe different systems... – Hugues M. Jul 19 '17 at 12:02
  • Added details about a stress-test I'm running. Still not seeing any actual issue, so I reworded the warning at the top because it was too harsh... this solution does have merit, and I invite you @rghome to try the stress-test, see for yourself (if you need to support many OSes, you might need to do many tests, though). – Hugues M. Jul 19 '17 at 21:55
  • I mentioned in the comments on the question that the file could be on a network drive (e.g., NFS). It is not intended that it is, but in enterprise environments operations teams do often use NAS for storage. So I am wondering if this would stop MappedByteBuffer from working since leverages the paging system. – rghome Jul 20 '17 at 16:17
  • Works for me on NFS too, sparseness included (just verified) – Hugues M. Jul 20 '17 at 16:29
  • I appreciate the effort you have put into the response. I think the "sliding window" solution is one way forward, but I am inclined to implement it without using memory mapped files (just to avoid the uncertainty there). It seems to be it would work well also sitting on top of a random access file as long as the buffer could be tuned to the block size of the O/S. – rghome Jul 22 '17 at 22:19
0

You say millions of records of a few bytes. So let's assume it's 10 millions of 10 bytes, which means that the file to write will have around 100 mb. In our times, that's not much.

I would just create a Map in which all key-value pairs were stored. Then would write a functioon that serializes the contents of the map to byte[]. And then simply Files.write() the bytes to the disk. Then replace the old file with the new file. Or, better yet, move the old file first, then move the new.

Dariusz
  • 19,750
  • 7
  • 65
  • 104
  • A Map to map numbers to other numbers is extremely inefficient. You could use a custom map frop Colt or Trove, but even then still not great. – rghome Jul 25 '17 at 10:58
0

I assume that when your keys after increasing sequentially for while then make a gap there won't be another key adding to the "finished" sequence. If this is correct then I would sujest the following solution

As long as your keys keep increasing sequentially keep working with your 1st approach:

write using a DataOutputStream wrapping a BufferedOutputStream, setting the buffer size to some number (e.g. 64k) to maximize I/O throughput.

write your data into a temp file. Once the gap occurres start writing to a next temp file and keep the record of your temp files. This way you get a file per sequence of records without gaps. Once you finished processing the dat for your main file then have a separate method that would smartly concatinate your temp files into a final file. This would be an easy task since you know that each temp file doesn't have any gaps

Michael Gantman
  • 4,318
  • 1
  • 12
  • 31
  • I think the downside here is that you are going to end up writing the file twice. – rghome Jul 25 '17 at 10:50
  • You are correct, but the concatination task could be done at a later stage and not take critical resources when the system is busy. The advantage is that you will work very efficiently (performance wise) while writing your sequential chunks and the logic is very simple. – Michael Gantman Jul 25 '17 at 12:07
0

My first effort at this would be to simply use RandomAccessFile naively and see if it is fast enough. I would actually be surprised if it is slow -- although Java won't buffer it, the filesystem implementation will.


If there really are performance problems, my next effort would be to wrap the RandomAccessFile in a buffering facade, with write logic along the lines of (java-ish pseudocode):

void write(record, location) {
     if(location != lastLocation + recordLength) {
          flushBufferToRandomAccessFile();
     )
     addToBuffer(record);
     flushBufferToRandomAccessFileIfFull();
     lastLocation = location;
}

The buffer would be a byte[]. The potential win here is that you're doing fewer randomAccessFile.write(buffer, 0, longLength) instead of more randomAccessFile.write(record, 0, shortLength).

You could tidy this up a bit by encapsulating all the necessary info about a buffered block in a Buffer class -- bytes, start location, end location. You'll also need to flush the buffer to file in a close() method).

That is, you're collecting blocks of records in heap memory, flushing to RandomAccessFile:

  • when you reach the size of your buffer,
  • when a record location isn't contiguous with the current buffered block
  • after the last record

I appreciate that you don't want to waste memory -- but regardless of whether it's in the heap or elsewhere, memory is memory, and you can't have buffering without it. With this solution you can tune the size of your buffer - and even if it's only enough for two records, it could halve the number of writes.

If you want to be fanatical about memory usage, you're using the wrong language.


If that was still not fast enough, I'd consider moving the writes into another thread. So write your records to a queue, and let file-writing thread consume from the queue. This won't make the file writing any faster in itself, but means that the consumer can catch up on a backlog while the producer is doing different work -- so its utility depends on whether the producer has such other work to do.

slim
  • 36,139
  • 10
  • 83
  • 117
  • I think this is a viable solution, although I would not flush the whole buffer if there was just a small gap. Allocating a few K for the buffer is acceptible for memory usage. I have to say though, I was hoping that there was a standard Java class somewhere that did this without me having to write one. – rghome Jul 25 '17 at 10:53
  • Of course, you could include short empty blocks in the buffer -- but you're chasing micro-optimisations, and there would be diminishing returns. – slim Jul 25 '17 at 11:05
-1

I've changed my mind on this. You should use MappedByteBuffer. It is paged by the operating system as part of the virtual memory subsystem, which satisfies your buffering requirement; it is as fast as a write to memory when writing; and it is subject to the operating system's behaviour when writing files with holes, which satisfies that requirement.

user207421
  • 289,834
  • 37
  • 266
  • 440
  • Yes - I mentioned RandomAccessFile in my question - I know how to use that. However, the writing is unbuffered and therefore extremely slow compared to writing sequentially with buffer. Remember that the records are small. What I want is buffered and random access (I want to have my cake and eat it). – rghome Jun 20 '17 at 11:40
  • So you would map the entire file once? And how do you handle the need to write further the end of file? I suppose that needs remapping... and then, we run into the same gotchas you mentioned about my answer... Or am I missing something? – Hugues M. Jul 19 '17 at 07:10