There is no efficient way - it will still require traversal of the heap, but there is a hacky way: when the heap is divided into smaller pieces (thus no need to scan the entire heap). This is the reason we have generational garbage collectors, so that the scanning takes less time.
This is relatively "easy" to answer when your entire application is stopped and you can analyze the graph of objects. It all starts from GC roots
(I'll let you find the documentation for what these are), but basically these are "roots" that are not collected by the GC
.
From here a certain scan starts that analyzes the "live" objects: objects that have a direct (or transitive) connection to these roots, thus not reclaimable. In graph theory this is know to "color/traverse" your graph by using 3 colors: black, grey and white. White
means it is not connected to the roots, grey
means it's sub-graph is not yet traversed, black
means traversed and connected to the roots. So basically to know what exactly is dead/alive right now - you simply need to take all your heap that is white initially and color it to black. Everything that is white
is garbage. It is interesting that "garbage" is really identified by a GC
by knowing what is actually alive. There are some drawings to visualize this here for example.
But this is the simple scenario: when your application is entirely stopped (for seconds at times) and you can scan the heap. This is called a STW
- stop the world event and people hate these usually. This is what parallel collectors do: stop everything, do whatever GC has to (including finding garbage), let the application threads start after that.
What happens when you app is running and you are scanning the heap? Concurrently
? G1/CMS
do this. Think about it: how can you reason about a leaf from a graph being alive or not when your app can change that leaf via a different thread.
Shenandoah
for example, solves this by "intercepting" changes over the graph. While running concurrently with your application, it will catch all the changes and insert these to some thread local special queues, called SATB Queues
(snapshot at the begging queues); instead of altering the heap directly. When that is finished, a very short STW
event will occur and these queues will be drained. Still under the STW
what that drain has "caused" is computed, i.e. : extra coloring of the graph. This is far simplified, just FYI. G1
and CMS
do it differently AFAIK.
So in theory, the process is not really that complicated, but implementing it concurrently is the most challenging part.