23

I've often read that in the Sun JVM short-lived objects ("relatively new objects") can be garbage collected more efficiently than long-lived objects ("relatively old objects")

  • Why is that so?
  • Is that specific to the Sun JVM or does this result from a general garbage collection principle?
Daniel Rikowski
  • 66,219
  • 52
  • 237
  • 318
  • 2
    It seems to me a lot of the answers are _describing_ garbage collection rather than the reason _why_ an eden garbage collection is quicker than a survivor-space garbage collection, unless they are implying it is the copying of references to the longer term pools that takes the time. – James B Apr 12 '10 at 08:42

8 Answers8

23

Most Java apps create Java objects and then discard them rather quickly eg. you create some objects in a method then once you exit the method all the object dies. Most apps behave this way and most people tend to code their apps this way. The Java heap is roughly broken up into 3 parts, permanent, old (long lived) generation, and young (short lived) generation. Young gen is further broken up into S1, S2 and eden. These are just heaps.

Most objects are created in the young gen. The idea here is that, since the mortality rate of objects is high, we quickly create them, use them and then discard them. Speed is of essence. As you create objects, the young gen fills up, until a minor GC occurs. In a minor GC, all objects that are alive are copied over from eden and say S2 to S1. Then, the 'pointer' is rested on eden and S2.

Every copy ages the object. By default, if an object survives 32 copies viz. 32 minor GC, then the GC figures that it is going to be around for a lot longer. So, what it does is to tenure it, by moving it to the old generation. Old gen is just one big space. When the old gen fills up, a full GC, or major GC, happens in the old gen. Because there is no other space to copy to, the GC has to compact. This is a lot slower than minor GC, that's why we avoid doing that more frequently.

You can tune the tenuring parameter with

java -XX:MaxTenuringThreshold=16 

if you know that you have lots of long lived objects. You can print the various age bucket of your app with

java -XX:-PrintTenuringDistribution
Stefanos T.
  • 540
  • 4
  • 7
Chuk Lee
  • 3,436
  • 18
  • 19
  • What's the point of this heap division into parts? Does minor GC mean that GC is traversing only graph for objects which are in young generation instead of in the whole heap? – Malachiasz Feb 28 '14 at 15:51
  • 1
    Minor GC only happens in young gen. See http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html for a detailed explanation on the generational spaces. – Chuk Lee Mar 01 '14 at 00:38
12

(see above explanations for more general GC.. this answers WHY new is cheaper to GC than old).

The reason eden can be cleared faster is simple: the algorithm is proportional to the number of objects that will survive GC in the eden space, not proportional to the number of live objects in the whole heap. ie: if you have an average object death rate of 99% in eden (ie: 99% of objects do not survive GC, which is not abnormal), you only need to look at and copy that 1%. For "old" GC, all live objects in the full heap need to be marked/swept. That is significantly more expensive.

Trent Gray-Donald
  • 2,221
  • 14
  • 16
4

This is generational garbage collection. It's used pretty widely these days. See more here: (wiki).

Essentially, the GC assumes that new objects are more likely to become unreachable than older ones.

issa marie tseng
  • 3,094
  • 20
  • 20
  • 1
    This answer is devoid of meaningful information for answering the question. – stackoverflowuser2010 May 27 '16 at 03:10
  • Thanks for your opinion on this six-year-old answer, in which I provided a formal name for the hazy concept the person was looking for and linked out to a very detailed reference describing exactly what they were looking for. – issa marie tseng Jun 12 '16 at 23:42
4

There this phenomena that "most objects die young". Many objects are created inside a method and never stored in a field. Therefore, as soon as the method exits these objects "die" and thus are candidate for collection at the next collection cycle.

Here is an example:

public String concatenate(int[] arr) { 
  StringBuilder sb = new StringBuilder();
  for(int i = 0; i < arr.length; ++i)
    sb.append(i > 0 ? "," : "").append(arr[i]);
  return sb.toString();
}

The sb object will become garbage as soon as the method returns.

By splitting the object space into two (or more) age-based areas the GC can be more efficient: instead of frequently scanning the entire heap, the GC frequently scans only the nursery (the young objects area) - which, obviously, takes much less time that a full heap scan. The older objects area is scanned less frequently.

Itay Maman
  • 28,289
  • 9
  • 76
  • 114
2

Young objects are managed more efficiently (not only collected; accesses to young objects are also faster) because they are allocated in a special area (the "young generation"). That special area is more efficient because it is collected "in one go" (with all threads stopped) and neither the collector nor the applicative code has to deal with concurrent access from the other.

The trade-off, here, is that the "world" is stopped when the "efficient area" is collected. This may induce a noticeable pause. The JVM keeps pause times low by keeping the efficient area small enough. In other words, if there is an efficiently-managed area, then that area must be small.

A very common heuristic, applicable to many programs and programming languages, is that many objects are very short-lived, and most of the write accesses occur in young objects (those which were created recently). It is possible to write application code which does not work that way, but these heuristic will be "mostly true" on "most applications". Thus, it makes sense to store young objects in the efficiently-managed area. Which is what the JVM GC does, and which is why that efficient area is called the "young generation".

Note that there are systems where the whole memory is handled "efficiently". When the GC must run, the application becomes "frozen" for a few seconds. This is harmless for long-run computations, but detrimental to interactivity, which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation.

Thomas Pornin
  • 68,772
  • 14
  • 141
  • 184
  • "The trade-off, here, is that the "world" is stopped when the "efficient area" is collected". You wouldn't usually stop the world to collect the young generation. – J D Feb 04 '12 at 11:57
  • "which is why most modern GC-enabled programming environments use generational GC with a limited-size young generation". Generational GC doesn't really help latency much. Throughput was the real motivation for choosing generational GC but that is now being challenged by mark-region collectors. – J D Feb 04 '12 at 12:00
1

This is based on the observation that the life-expectancy of an object goes up as it ages. So it makes sense to move objects to a less-frequently collected pool once they reach a certain age.

This isn't a fundamental property of the way programs use memory. You could write a pathological program that kept all objects around for a long time (and the same length of time for all objects), but this tends not to happen by accident.

Marcelo Cantos
  • 167,268
  • 37
  • 309
  • 353
  • 1
    Most of the code I churn out could be called pathological. Some of it is downright sociopathic :-) – paxdiablo Apr 12 '10 at 07:52
  • "You could write a pathological program...but this tends not to happen by accident". I think many programs are pathological for generational GCs. Queues, hash tables and caches all are. – J D Feb 04 '12 at 23:50
  • @JonHarrop: Queues are skewed towards shortness, hash tables have no say in how long their contents hang around, and cache-object longevity is tied to popularity, which is normally heavily skewed. You could coerce these data structures to behave pathologically, but they aren't inherently so. – Marcelo Cantos May 19 '14 at 23:38
  • @MarceloCantos: Consider the use of a queue in the breadth-first search graph algorithm. For what graphs will that queue abide by the generational hypothesis, i.e. have super-exponential lifetime decay? Also, what is the relationship between popularity of cache lines and lifetime? Again, it is not at all obvious that caches will abide by the generational hypothesis. – J D May 21 '14 at 00:06
  • @JonHarrop: You lost me there. What do hardware cache lines have to do with GC? – Marcelo Cantos May 21 '14 at 03:33
  • @MarceloCantos: Caches the abstract data structure, not hardware caches. – J D May 21 '14 at 20:38
  • @JonHarrop: What's a cache line in this context? – Marcelo Cantos May 22 '14 at 01:35
  • The things that get loaded/evicted. – J D May 22 '14 at 20:04
1

The JVM (usually) uses a generational garbage collector. This kind of collector separates the heap memory into several pools, according to the age of the objects in there. The reasoning here is based on the observation that most objects are short-lived, so that if you do a garbage collection on an area of memory with "young" objects, you can reclaim relatively more memory than if you do garbage collection across "older" objects.

In the Hotspot JVM, new objects get allocated in the so-called Eden area. When this area fills up, the JVM will sweep the Eden area (which does not take too much time, because it is not so big). Objects that are still alive are moved to the Survivor area, and the rest is discarded, freeing up Eden for the next generation. When the Eden collection is not sufficient does the the garbage collector move on to the older generations (which takes more work).

Thilo
  • 241,635
  • 91
  • 474
  • 626
  • What means does the Hotspot JVM use to detect what newer objects are referenced within older ones? Does it use write fences to tag older objects that have been modified since the last time Eden was swept, or do something else? By my understanding of the .net GC, it relies upon the fact that an object which has not been modified since the last gen-0 collection cannot hold any *direct or indirect* references to gen-0 objects, and an object which has not been modified since the last gen-1 collection likewise cannot hold any *direct or indirect* references to gen0 or gen1 objects. – supercat Aug 14 '12 at 23:30
1

All GCs behave that way. The basic idea is that you try to reduce the amount of objects that you need to check every time you run the GC because this is a pretty expensive operation. So if you have millions of objects but just need to check a few, that's way better than to have to check all of them. Also, a feature of GC plays into your hands: Temporary objects (which can't be reached by anyone anymore), have no cost during the GC run (well, let's ignore the finalize() method for now). Only objects which survive cost CPU time. Next, there is the observation that many objects are short lived.

Therefore, objects are created in a small space (called "Eden" or "young gen"). After a while, all objects that can be reached are copied (= expensive) out of this space and the space is then declared empty (so Java effectively forgets about all unreachable objects, so they don't have a cost since they don't have to be copied). Over time, long lived objects are moved to "older" spaces and the older spaces are swept less often to reduce the GC overhead (for example, every N runs, the GC will run an old space instead of the eden space).

Just to compare: If you allocate an object in C/C++, you need to call free() plus the destructor for each of them. This is one reason why GC is faster than traditional, manual memory management.

Of course, this is a rather simplified look. Today, working on GC is at the level of compiler design (i.e. done by very few people). GCs pull all kinds of tricks to make the whole process efficient and unnoticeable. See the Wikipedia article for some pointers.

Aaron Digulla
  • 297,790
  • 101
  • 558
  • 777