12

I'm seeing the following symptoms on an application's GC log file with the Concurrent Mark-Sweep collector:

4031.248: [CMS-concurrent-preclean-start]
4031.250: [CMS-concurrent-preclean: 0.002/0.002 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
4031.250: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 4036.346: [CMS-concurrent-abortable-preclean: 0.159/5.096 secs] [Times: user=0.00 sys=0.01, real=5.09 secs] 
4036.346: [GC[YG occupancy: 55964 K (118016 K)]4036.347: [Rescan (parallel) , 0.0641200 secs]4036.411: [weak refs processing, 0.0001300 secs]4036.411: [class unloading, 0.0041590 secs]4036.415: [scrub symbol & string tables, 0.0053220 secs] [1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] 

The preclean process keeps aborting continously. I've tried adjusting CMSMaxAbortablePrecleanTime to 15 seconds, from the default of 5, but that has not helped. The current JVM options are as follows...

Djava.awt.headless=true
 -Xms512m
 -Xmx512m
 -Xmn128m
 -XX:MaxPermSize=128m
 -XX:+HeapDumpOnOutOfMemoryError
 -XX:+UseParNewGC
 -XX:+UseConcMarkSweepGC
 -XX:BiasedLockingStartupDelay=0
 -XX:+DoEscapeAnalysis
 -XX:+UseBiasedLocking
 -XX:+EliminateLocks
 -XX:+CMSParallelRemarkEnabled
 -verbose:gc
 -XX:+PrintGCTimeStamps
 -XX:+PrintGCDetails
 -XX:+PrintHeapAtGC
 -Xloggc:gc.log
 -XX:+CMSClassUnloadingEnabled
 -XX:+CMSPermGenPrecleaningEnabled
 -XX:CMSInitiatingOccupancyFraction=50
 -XX:ReservedCodeCacheSize=64m
 -Dnetworkaddress.cache.ttl=30
 -Xss128k

It appears the concurrent-abortable-preclean never gets a chance to run. I read through https://blogs.oracle.com/jonthecollector/entry/did_you_know which had a suggestion of enabling CMSScavengeBeforeRemark, but the side effects of pausing did not seem ideal. Could anyone offer up any suggestions?

Also I was wondering if anyone had a good reference for grokking the CMS GC logs, in particular this line:

[1 CMS-remark: 16015K(393216K)] 71979K(511232K), 0.0746640 secs]

Not clear on what memory regions those numbers are referring to. Edit Found a link to this http://www.sun.com/bigadmin/content/submitted/cms_gc_logs.jsp

tomger
  • 529
  • 8
  • 15
jlintz
  • 453
  • 3
  • 8
  • The cms tag is used to refer to content-management systems, not concurrent mark and sweep GCs. I will remove it. – Michael Myers May 25 '10 at 16:55
  • whoops sorry about that, thanks – jlintz May 25 '10 at 16:58
  • Initiating CMS at 50% seems kind of low to me: -XX:CMSInitiatingOccupancyFraction=50 Maybe increasing it (or using the defaults as 'antispam' suggests) will behave differently. Also, my logs usually have ParNew running them before, during, and after CMS. Is ParNew running? – Joshua Davis Jun 24 '10 at 16:19

3 Answers3

3

[Times: user=0.00 sys=0.01, real=5.09 secs]

I would try investigate why CMS-concurrent-abortable-preclean-start doesn't get neither user nor sys CPU time in 5 seconds.

My suggestion is starting from a 'clean' JVM CMS startup flags like

-Djava.awt.headless=true
-Xms512m
-Xmx512m
-Xmn128m
-Xss128k
-XX:MaxPermSize=128m
-XX:+UseConcMarkSweepGC
-XX:+HeapDumpOnOutOfMemoryError
-Xloggc:gc.log
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+PrintHeapAtGC

then check if the problem reproduces and keep tweaking one parameter at a time.

fglez
  • 7,964
  • 4
  • 43
  • 75
3

As someone has already mentioned, the first step would be to increase the CMSInitiatingOccupancyFraction.

As a second step, I would use the flag -XX:-PrintTenuringDistribution and make sure that there is no premature promotion from the young generation to the old one. This would lead to old-to-young references which might lead to a longer abortable preclean phase. If there is such a premature promotion, try to adjust the ratio between the eden and the survior spaces.

2

There is a good explanation here about this phenomenon:

Quote:

So when the system load is light(which means there will be no minor gc), precleaning will always time out and full gc will always fail. cpu is waste.

It won't fail. It'll be less parallel (i.e. less efficient, and would have a longer pause time, for lesser work).

So all in all: this seems to be normal operation - the thread just waits for a minor GC to happen for 5 seconds, but there is no big issue when this does not happen: the JVM chooses a different (less efficient) strategy to continue with the GC.

ppeterka
  • 19,921
  • 6
  • 58
  • 74