1

As far as I know, MappedByteBuffer has several benefits, such as:

  1. It map the user space memory address to the kernel space memory address, so that it avoids memory copy from kernel space to user space while reading from file

  2. After the first read of a piece in the file (etc an offset from 0 to 100 in the buffer), this piece would be cached in the memory, so when the second time you read the same piece from the buffer, you are reading it from the memory directly but not from the disk.

My question is:

  1. Is my understanding above is correct?

  2. If my understanding is correct, when you read the piece just once (never read it again so it would not read it from the memory), is it the same by using FileChannel.read(buffer, position) and using MappedByteBuffer.get(byte[], offset, length)?

  3. For the random accessing to a file (not read the same piece repeatedly), is FileChannel would be more efficient because MappedByteBuffer would take the memory it map while FileChannel need no memory?

  4. What's difference between I use a MappedByteBuffer and I simply load the whole file into my memory. The benefit of MappedByteBuffer is that it use a memory outside the JVM heap so no GC concern?

Alexis
  • 910
  • 2
  • 18
  • 41

1 Answers1

3

Let me answer you questions one by one.

As far as I know, MappedByteBuffer has several benefits, such as:

  1. It map the user space memory address to the kernel space memory address, so that it avoids memory copy from kernel space to user space while reading from file

  2. After the first read of a piece in the file (etc an offset from 0 to 100 in the buffer), this piece would be cached in the memory, so when the second time you read the same piece from the buffer, you are reading it from the memory directly but not from the disk.

Your statements are not invalid. Though, it is important not miss simple fact. Access of file data are always involving cache (except when it does not). In case of memory mapping pages from cache are mapped to your address space, with FileChannel extra memory copy involved.

If my understanding is correct, when you read the piece just once (never read it again so it would not read it from the memory), is it the same by using FileChannel.read(buffer, position) and using MappedByteBuffer.get(byte[], offset, length)?

No, FileChannel.read(buffer, position) involves extra memory copy. Data are going to hang in cache for some time anyway.

For the random accessing to a file (not read the same piece repeatedly), is FileChannel would be more efficient because MappedByteBuffer would take the memory it map while FileChannel need no memory?

Your reasoning is incorrect. With either data access pattern, FileChannel does extra memory to memory copy, MappedByteBuffer doesn't. In addition, memory mapping is essentially lazy. Data is loaded from disk only when you are accessing corresponding memory page.

What's difference between I use a MappedByteBuffer and I simply load the whole file into my memory. The benefit of MappedByteBuffer is that it use a memory outside the JVM heap so no GC concern?

You can map a file orders of magnitude larger than physical memory on your box (single MappedByteBuffer is limited to 2GiB, so multiple mappings would be required). A page of file data access though mapping could be taken back by OS at any moment. As far as GC is concerned, indeed, MappedByteBuffer does not occupy heap.

What to chose between FileChannel and MappedByteBuffer?

Using memory mapped data has other nasty implication.

  1. Any access to data in memory could become an IO operation (if the memory page is not cached). I.e. every ByteBuffer.get() call is potentially blocking.
  2. MappedByteBuffer cannot be disposed. Memory mapping will stay active until cleaned by GC.

That makes MappedByteBuffer an exotic and rarely used way to access data.

If would advise you to avoid MappedByteBuffer if

  1. Your application is interactive and response time is important.
  2. You are actively using multiple threads to process data (single thread stuck on IO may cause cascading blocking by other threads).
  3. You want non-blocking file IO
Community
  • 1
  • 1
Alexey Ragozin
  • 7,288
  • 1
  • 20
  • 19
  • Thanks @Alexey, it seems that I have some misunderstanding to the mmap file, I think it would take the actual physical memory corresponding to the map size before, now I know it does not. So how much physical memory it would occupy if I map a 1.5G file, is it undefinned? – Alexis Apr 25 '18 at 00:43
  • @Alexis it is totally in the hands of OS. If you have spare RAM whole file is likely to be cached. – Alexey Ragozin Apr 25 '18 at 07:59