Understanding internal fragmentation properties of Hotspot JVM process

Question

For both on-heap and off-heap allocations. On-heap - in the context of three major garbage collectors: CMS, Parallel Old and and G1.

What I know (or think that I know) to the moment:

all object (on-heap) allocations are rounded up to 8 bytes boundary (or larger power of 2, configured by -XX:ObjectAlignmentInBytes.
G1
- For on-heap allocations smaller than the region size (1 to 32 MB, likely around heap size / 2048) there is no internal fragmentation, because there is no need, because the allocator never "fills holes".
- For allocations larger the region size, it rounds up allocation to the region size. I. e. allocation of the region size + 1 byte is very unlucky, it wastes almost 50% of memory.
For CMS, the only relevant information I found is

Naturally old space PLABs mimic structure of indexed free list space. Each thread preallocates certain number of chunk of each size below 257 heap words (large chunk allocated from global space).

From http://blog.ragozin.info/2011/11/java-gc-hotspots-cms-promotion-buffers.html. As far as I understand, referred "global space" is the main old space.

Questions:

Are the above statements correct?
What are the fragmentation properties of the main old space in CMS? What about allocations of more than "257 heap words"?
How the old space is managed with Parallel Old GC?
Does Hotspot JVM use the system memory allocator for off-heap allocations, or it re-manages it with a specific allocator?

UPD. A discussion thread: https://groups.google.com/forum/#!topic/mechanical-sympathy/A-RImwuiFZE

Why do you want to do? Remember this stuff changes from implementation to implementation, and from update to update. If you're trying to optimize, I think an up-to-date article might be your best bet. 2011 was a while ago. — markspace, Jun 23 '15 at 17:12
Google is your friend (use Search Tools -> Within One Year): [March 2015 JVM GC Tunning Guide](https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/) — markspace, Jun 23 '15 at 17:15
@markspace I've read this before posting. This guide says nothing about internal fragmentation. — leventov, Jun 23 '15 at 17:38
Regarding your 4th point, looking at the source, it is fairly easy to find out that, on the current hotspot, they are using a plain malloc to do allocations. The entry point is sun.misc.Unsafe.allocateMemory — Alexandre de Champeaux, Jun 29 '15 at 17:42
I'm honestly puzzled: the question has not a simple answer, but please consider that just like @markspace said, you *should thread carefully* when working with low level theme like fragmentation, arrangement of generations etc... The implementation of these can very wildly from one version to another (major) or slightly from one update to another thus making your implementation a bit of a shot in the dark.. If on the other end, your question is knowledge for the sake of knowledge, I'd like to hear experts opinion on some of your points as well! — witchedwiz, Jun 30 '15 at 09:30
@leventov Regarding the bounty and the attention this question has received, I think it's too broad. You're asking fairly detailed information about three different GCs, and you're also asking several questions. If you ask again, try to narrow it down to one GC at a time, and reduce the requested information to one or two points. You'll have to ask several questions to get all your answers, but I think perhaps the focus might make this easier for folks to answer. — markspace, Jun 30 '15 at 15:20
@AlexandredeChampeaux `Unsafe.allocateMemory` is not where most of the allocation is done. Most of it occurs before that can even be called (during start-up), and growth of the heaps happens with single, large allocations. Ordinary object allocation does not go through that path, and neither does any other region of managed memory such as where the JIT compiled bytecode goes. — Scott Carey, Jul 01 '15 at 23:14
@ScottCarey This depends on your use case. But yes I assumed the information he wanted to have was on native byte buffers, which was indeed a bit restrictive. — Alexandre de Champeaux, Jul 02 '15 at 08:28

score 6 · Answer 1 · edited Jan 10 '20 at 14:04

As far as I understand, the statements above are correct, although the bit on CMS is missing a lot of context to interpret it.
CMS is prone to fragmentation (in its old space, where CMS runs), which is one of its major flaws. If it fragments too much, it may occasionally have to stop the world and do a full mark and (sliding) compaction to remove the fragmentation, which leads to a large pause in the application. It is this flaw that is often cited as why G1 was developed. Some systems (e.g. HBase) purposely do most of their allocations with fixed size blocks in order to prevent or significantly reduce fragmenting CMS to avoid long stop-the-world pauses.
ParallelOldGC (or 'Old GC' in general) does not fragment. Objects are tenured to the old heap and when it runs out of space, a full mark and compact cycle is run. It can do this full GC faster than any of the other allocators, but with a typical run time of 1 second per 2 GB of heap, this can be too long for large heaps or latency sensitive applications.
Hotspot has used various strategies for off-heap allocation depending on the purpose. Allocating native byte buffers differs from its own allocation for compiled code or profiling data. I can not answer with authority here on any details, but I can only assume that much of this does not use the system allocator, else Hotspot would not perform as well as it does. Furthermore, there are parameters one can tune that control some of this space, e.g. -XX:ReservedCodeCacheSize, which suggests such a region of memory is managed through indirection and not directly via the system allocator. In short I would be rather surprised if the system allocator was directly used for any fine-grained allocation at all in hotspot.

You address external fragmentation mostly. My question is about internal fragmentation — leventov, Jul 04 '15 at 10:51

Understanding internal fragmentation properties of Hotspot JVM process

1 Answers1