4

I have a web app running on tomcat which has max heap size set to 8GB.

If no users log into the app, non-cleanable memory (which resides after a garbage collection) is quite low at about 1GB. In this situaiton I see constant memory growth for about 4GB and then garbage collector runs and memory again drops to about 1 GB. This pattern continues if no user gets logged in.

The GC logs show a Full GC takes 11 seconds, a considerable amount of time, compared to minor GC, which takes ~1 second:

2017-02-14T15:30:44.553+0530: 591.922: [GC (Allocation Failure) [PSYoungGen: 1501051K->631966K(1833472K)] 2392189K->1523112K(3030016K), 1.5100144 secs] 
    ...[Times: user=1.49 sys=0.01, real=1.51 secs]

2017-02-14T15:31:20.335+0530: 627.705: [GC (Allocation Failure) [PSYoungGen: 1553054K->595007K(1842176K)] 2444200K->1570521K(3038720K), 1.3050284 secs] 
    ...[Times: user=1.27 sys=0.04, real=1.31 secs]

2017-02-14T15:33:33.682+0530: 761.052: [GC (Allocation Failure) [PSYoungGen: 1516095K->556800K(1842176K)] 2491609K->1596474K(3038720K), 1.6957154 secs] 
    ...[Times: user=1.67 sys=0.03, real=1.69 secs]

2017-02-14T15:33:35.378+0530: 762.748: [Full GC (Ergonomics) [PSYoungGen: 556800K->365446K(1842176K)] [ParOldGen: 1039673K->1196476K(2018304K)] 1596474K->1561923K(3860480K), [Metaspace: 70472K->70472K(1114112K)], 11.2779843 secs] 
    ...[Times: user=11.13 sys=0.09, real=11.28 secs]

2017-02-14T15:34:56.232+0530: 843.602: [GC (Allocation Failure) [PSYoungGen: 1286534K->216613K(1842176K)] 2483011K->1609875K(3860480K), 1.4938761 secs] 
    ...[Times: user=1.45 sys=0.05, real=1.50 secs]

Because all other threads gets suspended during GC, if a user tries to access the webapp during a Full GC, server does not respond. What triggers this Full GC?

According to the log, minor GC events are due to allocation failures, and Full GC is due to Ergonomics. What does this mean?

There is plenty of free heap space left, and I want to delay Full GC from happening until there is no significant memory reduction due to minor GC. Can I prevent Full GC from happening in this case?

My VM args are below:

export CATALINA_OPTS="$CATALINA_OPTS -Xms512m -Xmx8192m -XX:+UseConcMarkSweepGC"
Scott Skiles
  • 2,933
  • 2
  • 26
  • 50
Lahiru Chandima
  • 16,832
  • 14
  • 78
  • 137
  • 1
    Could you please add all GC-relevant VM-options you are using(Xms, MaxGCPauseMillis, which collectors,...) plus the physical memory of your server. If you want low pause times delaying the full gcs is not the way to go as the pauses will be fewer but longer. So it seems to me your heap is configured too large for the goals you want to achieve (plus you are using the wrong collection-algorithm). – piet.t Feb 14 '17 at 13:30
  • @piet.t, vm options added to question – Lahiru Chandima Feb 14 '17 at 13:46
  • It is possible that the application is doing explicit `System.gc()` calls somewhere in the code. I've seen the CMS collector go a bit crazy and always execute a stop-the-world full GC after a call to `System.gc()`. You can make the GC ignore such calls through the `-XX:+DisableExplicitGC` JVM option. – Mick Mnemonic Feb 15 '17 at 00:33
  • @MickMnemonic, My app doesn't have any Sysgem.gc() calls, but I am not sure about tomcat. Anyway, if I do a manual GC through jconsole, the cause is shown as `System.gc()` in GC logs. In normal operation, GC log doesn't have any such entry. – Lahiru Chandima Feb 15 '17 at 00:49
  • 1
    If your application is doing RMI/RPC, then at least the JDK will be doing explicit `System.gc()` calls. It might be worth a shot to try the above option or a safer variant that doesn't ignore the calls, but instead invokes the CMS collector instead of full GC: `-XX:+ExplicitGCInvokesConcurrent`. – Mick Mnemonic Feb 15 '17 at 01:03

2 Answers2

1

I would point you to the Parallel Collector link. The "ergonomics" is a method for auto tune the collector with the specific behavior of an application.

Most of the time the auto-tuning is fine. In your case it seems that it ends in too long GC. You can fix it by adjusting yourself the parameter of the collector.

Quoting the documentation :

Maximum Garbage Collection Pause Time

The maximum pause time goal is specified with the command-line option -XX:MaxGCPauseMillis=. This is interpreted as a hint that pause times of milliseconds or less are desired; by default, there is no maximum pause time goal. If a pause time goal is specified, the heap size and other parameters related to garbage collection are adjusted in an attempt to keep garbage collection pauses shorter than the specified value. These adjustments may cause the garbage collector to reduce the overall throughput of the application, and the desired pause time goal cannot always be met.

Throughput

The throughput goal is measured in terms of the time spent doing garbage collection versus the time spent outside of garbage collection (referred to as application time). The goal is specified by the command-line option -XX:GCTimeRatio=, which sets the ratio of garbage collection time to application time to 1 / (1 + ).

For example, -XX:GCTimeRatio=19 sets a goal of 1/20 or 5% of the total time in garbage collection. The default value is 99, resulting in a goal of 1% of the time in garbage collection.

Footprint

Maximum heap footprint is specified using the option -Xmx. In addition, the collector has an implicit goal of minimizing the size of the heap as long as the other goals are being met.

Community
  • 1
  • 1
alain.janinm
  • 19,035
  • 10
  • 56
  • 103
  • -XX:MaxGCPauseMillis looks like a good option for me, but would it be able to minimize the time taken for a full GC? In my case, minor GC runs pretty fast, but full GC takes about 10 times longer. I have doubts whether -XX:MaxGCPauseMillis would be able to take this down to around 1 second. – Lahiru Chandima Feb 14 '17 at 14:11
  • @LahiruChandima No of course, it will try to do it's best to respect the time setted, but it's just a hint. If you keep having issue after customizing the GC, you should investigate why there are so much objects to garbage, and why it can't be done in more smaller timeframe. – alain.janinm Feb 14 '17 at 14:30
  • I will add -XX:MaxGCPauseMillis and check. Actually I don't think the higher duration for full GC is due to the large number of object count. As I observed through jconsole, when minor GC happens, it removes about 3GB of garbage objects in less than 2 seconds. – Lahiru Chandima Feb 14 '17 at 15:06
  • Interresting, please let me know the result after tweaking the configuration ;) – alain.janinm Feb 14 '17 at 15:22
  • @LahiruChandima Note that the minor GC usually uses a copy-collector. the execution time of this collector depends only on the amount of live objects, the amount of garbage is irrelevant. – piet.t Feb 16 '17 at 14:53
0

Addressing your primary problem (server stalls due excessive time spent for garbage collection), this may be due wrong garbage collector used for some reason. Garbage collection tuning is documented here. With Garbage-First collector you can set any pause you still find acceptable (default is 200 ms that should not be a big problem for a server).

Audrius Meskauskas
  • 18,378
  • 9
  • 63
  • 78