Could increase gc time short lived object that has references to old lived object?

Question

I need some clarification about how minor gc collections behave. calling a() or calling b() in a long-lived application, if they could behave worstly when old space gets bigger

//an example instance lives all application life cycle 24x7
public class Example {

    private Object longLived = new Object(); 

    public void a(){
        var shortLived = new ShortLivedObject(longLived); // longLived now is attribute
        shortLived.doSomething();
    }


    public void b(){
       new ShortLivedObject().doSomething(new Object()); // actually now is shortlived
    }

}

Where does my doubt comes from? I found out that in an app in which the used tenured space gets bigger, there is an increase of minor gc pauses.

Making some tests I found out that if I force the jvm to use option a() and another jvm to use option b(), then the jvm with option b() has shorter pause duration time when the old space gets bigger but i can't figured out why.

I solved that issue in the app, using this property XX:ParGCCardsPerStrideChunk in 4096, but i want to know if situation which i described above can lead in increasing gctimes cause scanning in gccard tables is slower or something that i don't know or is not related at all.

In general, you should not expect that these are different at all. — Louis Wasserman, Nov 19 '19 at 00:35
@LouisWasserman It happens when the old space gets bigger, but perhaps you are correct and this example is not related on what is happening really, i was looking for some clarification. I noticed difference in the percentile 99th and p99.9 in time responses — nachokk, Nov 19 '19 at 00:57
the _biggest_ problem here is that you are using a deprecated GC collector - `CMS`. You _need_ to switch to `G1` (or even better _Shenandoah_) and see what happens there. The second problem is that I doubt you know for a fact that `LongLivedObject` is _actually_ a _long_ lived object - is it referenced by a GC root? Third is that you are confusing terminology a lot: `CMS` has a STW pause for the _young_ generation and two short pauses in the old generation and a lot of other things that you confuse. — Eugene, Nov 19 '19 at 14:34
you also have asked 4 different questions that I can count here, all of them pretty generic - thus your best chance is to get a generic answer. — Eugene, Nov 19 '19 at 14:41
@Eugene i know that is a longlivedobject. Im not confusing , im focusing always in minor gc with ParNew. . Im not looking for a flag, im just trying to understand if using option `b()` is more performant that using `a()` . Perhaps im not clear and i have to improve the question — nachokk, Nov 19 '19 at 15:07
what do you mean you _know_ it's a long live Object? it does not matter if you pass it as a parameter or a method argument, reachability has nothing to do with that. — Eugene, Nov 19 '19 at 15:38
If `a` and `b` are truly viable alternatives in your application, it's an indicator that `LongLivedObject` is entirely obsolete. It's not holding state that deserves to be held and apparently can be reconstructed from nothing without any impact. — Holger, Nov 19 '19 at 15:41
@Holger That's certainly true, but i want to know what option is better (more performant) or i should not expect they are different — nachokk, Nov 19 '19 at 15:44
@Holger i dont know, i want to know in terms of gc what is more performant or is not a real difference, want to know why is better one option than another, cause i can't figure out — nachokk, Nov 19 '19 at 16:58
@Eugene i think you misunderstood my example, cause i only pass as an argument to be type safed, i will change the example, i put a comment that is shortlived but perhaps is not clear — nachokk, Nov 19 '19 at 17:00
Performance can only measured in terms of CPU cycles or memory needed to perform a particular task, but you comparison lacks the actual task. Since both constructs do something entirely different, there is no sense in this comparison. Just like it doesn’t make sense to say that a racing car is faster than a truck when your task is to transport furniture. — Holger, Nov 20 '19 at 08:26
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202716/discussion-between-nachokk-and-holger). — nachokk, Nov 20 '19 at 11:54

score 2 · Accepted Answer · answered Nov 21 '19 at 21:54

Disclaimer: I am by far no GC expert, but lately getting into these details for fun.

As I said in the comments, you are using a collector that is deprecated, no one supports it and no one wants to use it, switch to G1 or even better IMHO switch to Shenandoah : start from this simple thing first.

I can only assume that you increased ParGCCardsPerStrideChunk from its default value and that probably helped by a few ms (though we have no proof of that). We also have no logs from GC, CPU activity, logs, etc; thus this is pretty complicated to answer.

If indeed you have a big heap (tens of GB) and a big young space and you have enough GC Threads, setting that parameter to a bigger value might help indeed and it might even have to do with card table that you are mentioning. Read further why.

CMS splits the heap into old space and young space, it could have chosen any other discriminator, but they chose age (just like G1). Why is that needed? To be able to scan and collect only partial regions of the heap (scanning it entirely is very expensive). young space is collected with a stop-the-world pause, so it better be small, otherwise you will not be happy; that is why also why you usually will see many more young collections compare to old ones.

The only problem when you scan young space is: what happens if there are references from old space to objects from young space? Collecting those is obviously wrong, but scanning the entire old space to find out that answer would defeat the purpose of generational collections entirely. Thus: card table.

This keeps track of reference from old space to young space references, so it knows what exactly is garbage or not. G1 uses a card table too, but also adds a RememberedSet (not going into the details here). In practice, RememberedSets turned out to be HUGE, that is why G1 became generational. (FYI: Shenandoah uses matrix instead of card table - making it not generational).

So this huge intro, was to show that indeed increasing ParGCCardsPerStrideChunk might have helped. You are giving each GC thread more space to work on. The default value is 256 and card table is 512 bytes, that means

256 * 512 = 128KB per stride of old generation

If you for example have a heap of 32 GB how many hundreds of thousands of strides is that? Probably too many.

Now, why you also bring reference counting into the discussion here? I have no idea.

The examples that you have shown have different semantics and as such are kind of difficult to reason about; I'll still try to, though. You have to understand that reachability of Objects is just a graph that starts from some roots (called GC roots). Let's take this example first:

public void b(){
   new ShortLivedObject().doSomething(new Object()); // actually now is shortlived
}

ShortLivedObject instance is "forgotten" as soon as doSomething method invocation is done and its scope is within the method only, as such no one can reach it. Thus the remaining part is about the parameter of doSomething : new Object. If doSomething does not do anything fishy with the parameter it got (making it reachable via a GC root graph), then after doSomething is done, it would become eligible for GC too. But even if doSomething makes new Object reachable it still means that ShortLivedObject instance is eligible for GC.

As such, even if Example is reachable (means it can't be collected), ShortLivedObject and new Object() can potentially be collected. It can look like this:

                 new Object()
                      |
                     \ /
               ShortLivedObject           
                      |
                     \ /
GC Root -> ... - > Example

You can see that once GC will scan Example instance, it might not scan ShortLivedObject at all (that is why garbage is identified as the opposite of live objects). So a GC algorithm will simply discard the entire graph and not scan it at all.

The second example is different:

public void a(){
    var shortLived = new ShortLivedObject(longLived);
    shortLived.doSomething();
}

The difference is that longLived here is an instance field and, as such, the graph will look a bit different:

                ShortLivedObject
                      |
                     \ /
                  longLived         
                     / \
                      |
GC Root -> ... - > Example

It's obvious that ShortLivedObject can be collected in this case, but not longLived.

What you have to understand that this does not matter at all, if Example instance can be collected; this graph will not be traversed and everything that Example uses can be collected.

You should be able to understand now that using method a can retain a bit more garbage and can potentially move it to old space (when they become old enough) and can potentially make your young pauses be longer and indeed increasing ParGCCardsPerStrideChunk might help a bit; but this is highly speculative and you would need a pretty bad same pattern of allocations to happen for all of this to happen. Without logs, I highly doubt that.

Could increase gc time short lived object that has references to old lived object?

1 Answers1

Linked