22

I understand that in Java, if an object doesn't have any references to it any more, the garbage collector will reclaim it back some time later.

But how does the garbage collector know that an object has or has not references associated to it?

Is garbage collector using some kind of hashmap or table?


Edit:

Please note that I am not asking how generally gc works. really, I am not asking that.

I am asking specifically that How gc knows which objects are live and which are dead, with efficiencies.

That's why I say in my question that is gc maintain some kind of hashmap or set, and consistently update the number of references an object has?

Jackson Tale
  • 23,820
  • 29
  • 135
  • 251
  • 1
    possible duplicate of [Theory and algorithm behind Java garbage collection](http://stackoverflow.com/questions/4141237/theory-and-algorithm-behind-java-garbage-collection) – Ernest Friedman-Hill May 14 '12 at 17:05
  • @ErnestFriedman-Hill no this question is not a duplicate for http://stackoverflow.com/questions/4141237/theory-and-algorithm-behind-java-garbage-collection I am not asking the underlying theory for garbage collection. Instead, I am asking specifically about How can garbage collector manage the number of references that an object currently has so later the collector can easily decide whether to reclaim it back or not. – Jackson Tale May 15 '12 at 10:55
  • That is precisely "the underlying theory for garbage collection". – Ernest Friedman-Hill May 15 '12 at 11:21
  • @ErnestFriedman-Hill the point is that that post does not answer my question. – Jackson Tale May 15 '12 at 11:24

5 Answers5

12

A typical modern JVM uses several different types of garbage collectors.

One type that's often used for objects that have been around for a while is called Mark-and-Sweep. It basically involves starting from known "live" objects (the so-called garbage collection roots), following all chains of object references, and marking every reachable object as "live".

Once this is done, the sweep stage can reclaim those objects that haven't been marked as "live".

For this process to work, the JVM has to know the location in memory of every object reference. This is a necessary condition for a garbage collector to be precise (which Java's is).

NPE
  • 438,426
  • 93
  • 887
  • 970
  • So you mean every time gc is going to do a clean up, it will scan all in-memeory objects (following Mark and Sweep process)? Is that efficient? – Jackson Tale May 15 '12 at 10:56
  • I mean if I have millions of objects in memory, I will image each time a full scan might not be efficient. – Jackson Tale May 15 '12 at 11:00
  • @JacksonTale: The only way to know whether or not something is a problem is to measure it. This is what profilers are for (they can tell you quite a lot about what's going on with GC.) – NPE May 15 '12 at 11:05
6

Java has a variety of different garbage collection strategies, but they all basically work by keeping track which objects are reachable from known active objects.

A great summary can be found in the article How Garbage Collection works in Java but for the real low-down, you should look at Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine

An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.

Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work.

The most important of these observed properties is infant mortality. ...

I.e. many objects like iterators only live for a very short time, so younger objects are more likely to be eligible for garbage collection than much older objects.

For more up to date tuning guides, take a look at:

Incidentally, be careful of trying to second guess your garbage collection strategy, I've known many a programs performance for be trashed by over zealous use of System.gc() or inappropriate -XX options.

Community
  • 1
  • 1
Mark Booth
  • 6,794
  • 2
  • 60
  • 88
2

GC will know that object can be removed as quickly as it is possible. You are not expected to manage this process.

But you can ask GC very politely to run using System.gc(). It is just a tip to the system. GC does not have to run at that moment, it does not have to remove your specific object etc. Because GC is the BIG boss and we (Java programmers) are just its slaves... :(

AlexR
  • 109,181
  • 14
  • 116
  • 194
  • 2
    +1 to the GC reign. The Garbage Chucknorris only receives issues but it never takes orders. It's like requesting a service and getting an answer whenever the server decides. – Fritz May 14 '12 at 17:13
0

There is no efficient way - it will still require traversal of the heap, but there is a hacky way: when the heap is divided into smaller pieces (thus no need to scan the entire heap). This is the reason we have generational garbage collectors, so that the scanning takes less time.

This is relatively "easy" to answer when your entire application is stopped and you can analyze the graph of objects. It all starts from GC roots (I'll let you find the documentation for what these are), but basically these are "roots" that are not collected by the GC.

From here a certain scan starts that analyzes the "live" objects: objects that have a direct (or transitive) connection to these roots, thus not reclaimable. In graph theory this is know to "color/traverse" your graph by using 3 colors: black, grey and white. White means it is not connected to the roots, grey means it's sub-graph is not yet traversed, black means traversed and connected to the roots. So basically to know what exactly is dead/alive right now - you simply need to take all your heap that is white initially and color it to black. Everything that is white is garbage. It is interesting that "garbage" is really identified by a GC by knowing what is actually alive. There are some drawings to visualize this here for example.

But this is the simple scenario: when your application is entirely stopped (for seconds at times) and you can scan the heap. This is called a STW - stop the world event and people hate these usually. This is what parallel collectors do: stop everything, do whatever GC has to (including finding garbage), let the application threads start after that.

What happens when you app is running and you are scanning the heap? Concurrently? G1/CMS do this. Think about it: how can you reason about a leaf from a graph being alive or not when your app can change that leaf via a different thread.

Shenandoah for example, solves this by "intercepting" changes over the graph. While running concurrently with your application, it will catch all the changes and insert these to some thread local special queues, called SATB Queues (snapshot at the begging queues); instead of altering the heap directly. When that is finished, a very short STW event will occur and these queues will be drained. Still under the STW what that drain has "caused" is computed, i.e. : extra coloring of the graph. This is far simplified, just FYI. G1 and CMS do it differently AFAIK.


So in theory, the process is not really that complicated, but implementing it concurrently is the most challenging part.

Eugene
  • 102,901
  • 10
  • 149
  • 252
0

The truth is that the garbage collector does not, in general, quickly know which objects no longer have any incoming references. And, in fact, an object can be garbage even when there are incoming references it.

The garbage collector uses a traversal of the object graph to find the objects that are reachable. Objects that are not reached in this traversal are deemed garbage, even if they are part of a cycle of references. The delay between an object being unreachable, and the garbage collector actually collecting the object, could be arbitrarily long.

andru
  • 540
  • 4
  • 6