3

Here is a piece of GC logs that shows a full CMS GC event:

2016-12-29T22:44:34.741-0500: 27572.982: [GC (CMS Initial Mark) [1 CMS-initial-mark: 2508212K(23068672K)] 2931097K(26843584K), 0.0213349 secs] [Times: user=0.22 sys=0.00, real=0.02 secs] 
2016-12-29T22:44:34.763-0500: 27573.004: [CMS-concurrent-mark-start]
2016-12-29T22:44:36.013-0500: 27574.254: [CMS-concurrent-mark: 0.208/1.250 secs] [Times: user=3.07 sys=0.10, real=1.25 secs] 
2016-12-29T22:44:36.014-0500: 27574.255: [CMS-concurrent-preclean-start]
2016-12-29T22:44:36.061-0500: 27574.303: [CMS-concurrent-preclean: 0.047/0.048 secs] [Times: user=0.12 sys=0.00, real=0.05 secs] 
2016-12-29T22:44:36.062-0500: 27574.303: [CMS-concurrent-abortable-preclean-start]
 CMS: abort preclean due to time 2016-12-29T22:44:41.339-0500: 27579.580: [CMS-concurrent-abortable-preclean: 4.084/5.277 secs] [Times: user=9.53 sys=0.46, real=5.27 secs] 
2016-12-29T22:44:41.356-0500: 27579.598: [GC (CMS Final Remark) [YG occupancy: 3509179 K (3774912 K)]2016-12-29T22:44:41.357-0500: 27579.598: [Rescan (parallel) , 0.0816014 secs]2016-12-29T22:44:41.438-0500: 27579.680: [weak refs processing, 0.0000347 secs]2016-12-29T22:44:41.438-0500: 27579.680: [class unloading, 0.0292451 secs]2016-12-29T22:44:41.468-0500: 27579.709: [scrub symbol table, 0.0069857 secs]2016-12-29T22:44:41.475-0500: 27579.716: [scrub string table, 0.0010933 secs][1 CMS-remark: 2508212K(23068672K)] 6017391K(26843584K), 0.1194737 secs] [Times: user=2.18 sys=0.00, real=0.12 secs] 
2016-12-29T22:44:41.477-0500: 27579.718: [CMS-concurrent-sweep-start]
2016-12-29T22:44:41.618-0500: 27579.860: [GC (Allocation Failure) 2016-12-29T22:44:41.619-0500: 27579.860: [ParNew: 3579431K->419392K(3774912K), 5.7752364 secs] 6080306K->3302237K(26843584K), 5.7769753 secs] [Times: user=4.70 sys=8.50, real=5.78 secs] 
2016-12-29T22:44:47.698-0500: 27585.940: [CMS-concurrent-sweep: 0.230/6.221 secs] [Times: user=5.76 sys=8.55, real=6.22 secs] 
2016-12-29T22:44:47.699-0500: 27585.940: [CMS-concurrent-reset-start]
2016-12-29T22:44:47.717-0500: 27585.958: [GC (System.gc()) 2016-12-29T22:44:47.718-0500: 27585.959: [ParNew: 527384K->155025K(3774912K), 0.7351766 secs] 3405551K->3441841K(26843584K), 0.7366514 secs] [Times: user=5.61 sys=1.44, real=0.74 secs] 
2016-12-29T22:44:48.503-0500: 27586.744: [CMS-concurrent-reset: 0.049/0.804 secs] [Times: user=5.81 sys=1.45, real=0.81 secs] 

There seems to be a minor GC event while the CMS is running:

2016-12-29T22:44:41.618-0500: 27579.860: [GC (Allocation Failure) 2016-12-29T22:44:41.619-0500: 27579.860: [ParNew: 3579431K->419392K(3774912K), 5.7752364 secs] 6080306K->3302237K(26843584K), 5.7769753 secs] [Times: user=4.70 sys=8.50, real=5.78 secs] 

Is that OK? Does the minor GC block the full GC?

Can that explain the very high system time that we're seeing? (sys=8.55 secs, sys=8.50 secs)

benji
  • 2,026
  • 5
  • 25
  • 51

3 Answers3

1

In your case the GC algorithms used are:-

Parallel New for Young + Concurrent Mark and Sweep (CMS) for the Old Generation

There seems to be a minor GC event while the CMS is running, Is that ok?
Yes, ParNew and CMS can run parallely.

Does the minor GC block the full GC?
Yes, a concurrent CMS collection is "interrupted" by ParNew minor collections. Collections of the Young Generation can occur anytime during concurrent collecting the Old Generation. In such case the major collection will be interleaved with the Minor GC events

Can that explain the very high system time that we're seeing?
You need to check the frequency and duration of Stop-the-world events in GC.

  1. CMS intial mark
  2. CMS Final remark
  3. ParNew

By looking at your log, I can make out that your Minor GC (ParNew) is taking too much time to complete.
5.7752364 secs : Time it took for the garbage collector to mark and copy live objects in the Young Generation+communication overhead with ConcurrentMarkSweep collector

Good reference for Java GC, worth reading.

Amit Bhati
  • 5,215
  • 1
  • 19
  • 42
0

Of course young collections can occur during old collections. Usually that is not a big problem but in cases of a CMS remark phase directly following a young collection (or other way around hence both are stop-the-world pauses) it can harm the performance. The CMS will try to avoid that by calculating when the next young collection will occur, but that's just the theory.

Kevin Peters
  • 2,849
  • 1
  • 14
  • 36
0

It is absolutely OK if Young GC runs between CMS phases. Doing a young collection right before CMS initial mark or CMS remark can even reduce stop-the-world pauses of major GC.

What is NOT normal is the large sys time (CPU time spent in kernel) while doing GC. This should not normally happen since garbage collection is mostly user-space work.

Check top, perf, etc. to find what causes high system time. This is typically related to memory issues: Swapping or Transparent Huge Pages defrag.

apangin
  • 79,047
  • 9
  • 168
  • 200