7

Sometimes, on a test machine (Windows), Java full GC takes over 20 seconds. The GC logs show that the real time is very high, user is also high, but not as high (around 3 seconds). In one case, it is:

21.2209796 secs [Times: user=3.76 sys=0.45, real=21.22 secs]

What could be the reason? I have a suspicion but don't want to say right now.

What is the easiest way to find out what the root cause is? If you don't know for Windows but know for Linux, that's an option as well: the tests could be re-run on Linux.

The process was started as follows:

"%JAVA_HOME%\bin\java.exe" -verbose:gc -XX:+PrintGCDetails 
    -XX:+PrintGCDateStamps -Xloggc:publish.gc.log 
    -XX:PermSize=256m -XX:MaxPermSize=1024m -Xms1024m 
    -Xmx3072m -XX:NewSize=352m -XX:MaxNewSize=352m 
    -XX:SurvivorRatio=6 

I don't know why NewSize, XX:MaxNewSize and SurvivorRatio are used. MaxPermSize is needed because it's an OSGi environment. Full thread dumps don't show anything special.

The complete log of full GCs is (sorry, long):

2014-05-06T21:38:18.735-0700: 150.261: [Full GC [PSYoungGen: 23614K->0K(309952K)] [PSOldGen: 693188K->311405K(688128K)] 716802K->311405K(998080K) [PSPermGen: 45120K->45120K(262144K)], 1.4533481 secs] [Times: user=1.45 sys=0.01, real=1.45 secs] 
2014-05-06T21:38:58.379-0700: 189.907: [Full GC [PSYoungGen: 1664K->0K(335488K)] [PSOldGen: 668139K->283959K(688128K)] 669803K->283959K(1023616K) [PSPermGen: 46688K->46688K(262144K)], 1.3466773 secs] [Times: user=1.34 sys=0.00, real=1.35 secs] 
2014-05-06T21:39:44.082-0700: 235.613: [Full GC [PSYoungGen: 15200K->0K(309184K)] [PSOldGen: 668643K->292329K(740800K)] 683843K->292329K(1049984K) [PSPermGen: 47733K->47733K(262144K)], 1.3649206 secs] [Times: user=1.36 sys=0.00, real=1.37 secs] 
2014-05-06T21:40:33.090-0700: 284.623: [Full GC [PSYoungGen: 10132K->0K(326528K)] [PSOldGen: 718669K->308144K(858368K)] 728802K->308144K(1184896K) [PSPermGen: 49596K->43403K(262144K)], 1.2978991 secs] [Times: user=1.31 sys=0.02, real=1.30 secs] 
2014-05-06T21:41:39.407-0700: 350.943: [Full GC [PSYoungGen: 37815K->0K(308160K)] [PSOldGen: 842013K->560237K(1236160K)] 879828K->560237K(1544320K) [PSPermGen: 43807K->43807K(262144K)], 1.2742777 secs] [Times: user=1.26 sys=0.00, real=1.27 secs] 
2014-05-06T21:43:36.782-0700: 468.324: [Full GC [PSYoungGen: 640K->0K(329216K)] [PSOldGen: 1233468K->536017K(1242944K)] 1234108K->536017K(1572160K) [PSPermGen: 44438K->44438K(262144K)], 1.4105887 secs] [Times: user=1.42 sys=0.00, real=1.41 secs] 
2014-05-06T21:44:28.404-0700: 519.948: [Full GC [PSYoungGen: 45056K->0K(315392K)] [PSOldGen: 1196082K->481770K(1266240K)] 1241138K->481770K(1581632K) [PSPermGen: 56883K->56883K(262144K)], 1.5773390 secs] [Times: user=1.58 sys=0.00, real=1.58 secs] 
2014-05-06T21:44:51.101-0700: 542.646: [Full GC [PSYoungGen: 30000K->0K(315392K)] [PSOldGen: 1213653K->330467K(1129664K)] 1243653K->330467K(1445056K) [PSPermGen: 106752K->105936K(262144K)], 1.9804992 secs] [Times: user=1.97 sys=0.02, real=1.98 secs] 
2014-05-06T21:45:33.230-0700: 584.777: [Full GC [PSYoungGen: 10301K->0K(319616K)] [PSOldGen: 1105014K->404683K(1248192K)] 1115315K->404683K(1567808K) [PSPermGen: 122771K->122771K(262144K)], 1.9884441 secs] [Times: user=2.00 sys=0.00, real=1.99 secs] 
2014-05-06T21:46:24.169-0700: 635.719: [Full GC [PSYoungGen: 8805K->0K(331136K)] [PSOldGen: 1226188K->506891K(1449856K)] 1234994K->506891K(1780992K) [PSPermGen: 140233K->140233K(262144K)], 2.4515626 secs] [Times: user=2.45 sys=0.01, real=2.45 secs] 
2014-05-06T21:47:34.131-0700: 705.683: [Full GC [PSYoungGen: 45054K->0K(315392K)] [PSOldGen: 1412337K->452967K(1474560K)] 1457392K->452967K(1789952K) [PSPermGen: 141009K->141009K(262144K)], 2.3155816 secs] [Times: user=2.31 sys=0.00, real=2.32 secs] 
2014-05-06T21:48:17.036-0700: 748.590: [Full GC [PSYoungGen: 36896K->0K(315392K)] [PSOldGen: 1392671K->354862K(1476608K)] 1429567K->354862K(1792000K) [PSPermGen: 141037K->140960K(262144K)], 2.2993743 secs] [Times: user=2.28 sys=0.00, real=2.30 secs] 
2014-05-06T21:49:23.816-0700: 815.374: [Full GC [PSYoungGen: 45036K->0K(315392K)] [PSOldGen: 1454959K->348537K(1509824K)] 1499995K->348537K(1825216K) [PSPermGen: 140963K->140963K(262144K)], 2.0907144 secs] [Times: user=2.09 sys=0.00, real=2.09 secs] 
2014-05-06T22:04:38.728-0700: 1729.490: [Full GC [PSYoungGen: 8464K->0K(345344K)] [PSOldGen: 1505186K->422020K(1583296K)] 1513651K->422020K(1928640K) [PSPermGen: 151404K->151404K(262144K)], 3.2901866 secs] [Times: user=2.47 sys=0.00, real=3.31 secs] 
2014-05-06T22:09:33.924-0700: 2023.858: [Full GC [PSYoungGen: 9766K->0K(342784K)] [PSOldGen: 1573919K->443383K(1656064K)] 1583686K->443383K(1998848K) [PSPermGen: 168657K->168657K(262144K)], 2.8910659 secs] [Times: user=2.59 sys=0.02, real=2.90 secs] 
2014-05-06T22:16:32.650-0700: 2442.188: [Full GC [PSYoungGen: 5856K->0K(346944K)] [PSOldGen: 1650623K->398157K(1666944K)] 1656479K->398157K(2013888K) [PSPermGen: 176960K->153660K(262144K)], 5.6906733 secs] [Times: user=2.89 sys=0.11, real=5.69 secs] 
2014-05-06T22:32:40.071-0700: 3409.496: [Full GC [PSYoungGen: 8566K->0K(346304K)] [PSOldGen: 1655315K->460797K(1767360K)] 1663882K->460797K(2113664K) [PSPermGen: 155265K->155265K(262144K)], 3.8090058 secs] [Times: user=2.81 sys=0.03, real=3.81 secs] 
2014-05-06T22:37:53.667-0700: 3723.074: [Full GC [PSYoungGen: 10243K->0K(347456K)] [PSOldGen: 1761438K->458285K(1835712K)] 1771681K->458285K(2183168K) [PSPermGen: 162919K->162919K(262144K)], 4.0235684 secs] [Times: user=2.47 sys=0.06, real=4.02 secs] 
2014-05-06T22:49:53.387-0700: 4442.757: [Full GC [PSYoungGen: 6441K->0K(348224K)] [PSOldGen: 1823902K->451504K(1872448K)] 1830344K->451504K(2220672K) [PSPermGen: 171778K->171778K(262144K)], 2.7130221 secs] [Times: user=2.62 sys=0.02, real=2.71 secs] 
2014-05-06T22:59:59.750-0700: 5049.088: [Full GC [PSYoungGen: 6708K->0K(347904K)] [PSOldGen: 1862823K->365106K(1845248K)] 1869532K->365106K(2193152K) [PSPermGen: 180349K->154273K(262144K)], 9.6984956 secs] [Times: user=2.74 sys=0.16, real=9.70 secs] 
2014-05-06T23:10:35.103-0700: 5684.407: [Full GC [PSYoungGen: 7984K->0K(352192K)] [PSOldGen: 1843448K->585793K(2170304K)] 1851433K->585793K(2522496K) [PSPermGen: 160183K->160183K(262144K)], 2.9633020 secs] [Times: user=2.93 sys=0.00, real=2.96 secs] 
2014-05-06T23:24:38.972-0700: 6528.232: [Full GC [PSYoungGen: 7376K->0K(345344K)] [PSOldGen: 2159450K->684719K(2346304K)] 2166826K->684719K(2691648K) [PSPermGen: 164349K->164349K(262144K)], 8.8952964 secs] [Times: user=3.39 sys=0.05, real=8.90 secs] 
2014-05-06T23:28:10.950-0700: 6740.198: [Full GC [PSYoungGen: 9085K->0K(346368K)] [PSOldGen: 2343205K->724235K(2458176K)] 2352290K->724235K(2804544K) [PSPermGen: 186771K->186771K(262144K)], 3.5238065 secs] [Times: user=3.29 sys=0.00, real=3.52 secs] 
2014-05-06T23:33:58.654-0700: 7087.885: [Full GC [PSYoungGen: 9551K->0K(349184K)] [PSOldGen: 2453964K->597434K(2438976K)] 2463516K->597434K(2788160K) [PSPermGen: 205948K->166988K(262144K)], 12.9225761 secs] [Times: user=3.59 sys=0.12, real=12.92 secs] 
2014-05-06T23:48:13.206-0700: 7942.391: [Full GC [PSYoungGen: 37116K->0K(313728K)] [PSOldGen: 2422753K->610864K(2500736K)] 2459870K->610864K(2814464K) [PSPermGen: 172487K->172487K(262144K)], 4.9881546 secs] [Times: user=3.28 sys=0.03, real=4.99 secs] 
2014-05-07T00:06:19.018-0700: 9028.146: [Full GC [PSYoungGen: 8589K->0K(321920K)] [PSOldGen: 2492066K->620020K(2538304K)] 2500655K->620020K(2860224K) [PSPermGen: 179888K->179888K(262144K)], 3.5624930 secs] [Times: user=3.10 sys=0.06, real=3.56 secs] 
2014-05-07T00:13:51.029-0700: 9480.134: [Full GC [PSYoungGen: 8736K->0K(330688K)] [PSOldGen: 2529168K->564197K(2540800K)] 2537904K->564197K(2871488K) [PSPermGen: 185765K->185765K(262144K)], 3.0340707 secs] [Times: user=2.95 sys=0.00, real=3.03 secs] 
2014-05-07T00:25:39.758-0700: 10188.825: [Full GC [PSYoungGen: 17725K->0K(315392K)] [PSOldGen: 2500502K->543855K(2582528K)] 2518228K->543855K(2897920K) [PSPermGen: 190076K->167636K(262144K)], 12.3609341 secs] [Times: user=3.77 sys=0.16, real=12.36 secs] 
2014-05-07T00:44:01.053-0700: 11290.063: [Full GC [PSYoungGen: 12896K->0K(347264K)] [PSOldGen: 2575016K->547199K(2621184K)] 2587912K->547199K(2968448K) [PSPermGen: 172447K->172447K(262144K)], 6.9150070 secs] [Times: user=3.15 sys=0.03, real=6.92 secs] 
2014-05-07T01:41:59.932-0700: 14768.758: [Full GC [PSYoungGen: 23293K->0K(337152K)] [PSOldGen: 2638852K->580840K(2712704K)] 2662146K->580840K(3049856K) [PSPermGen: 175259K->175259K(262144K)], 11.0725751 secs] [Times: user=3.32 sys=0.13, real=11.07 secs] 
2014-05-07T01:47:58.358-0700: 15127.165: [Full GC [PSYoungGen: 4176K->0K(315392K)] [PSOldGen: 2684820K->577513K(2738176K)] 2688996K->577513K(3053568K) [PSPermGen: 179064K->179064K(262144K)], 6.5166056 secs] [Times: user=3.01 sys=0.02, real=6.52 secs] 
2014-05-07T01:55:50.842-0700: 15599.624: [Full GC [PSYoungGen: 18686K->0K(319616K)] [PSOldGen: 2735388K->490034K(2761664K)] 2754074K->490034K(3081280K) [PSPermGen: 182650K->169843K(262144K)], 21.2209796 secs] [Times: user=3.76 sys=0.45, real=21.22 secs] 
2014-05-07T02:10:06.723-0700: 16455.460: [Full GC [PSYoungGen: 5561K->0K(321600K)] [PSOldGen: 2752252K->569179K(2785280K)] 2757813K->569179K(3106880K) [PSPermGen: 175120K->175120K(262144K)], 3.2125221 secs] [Times: user=3.15 sys=0.02, real=3.21 secs] 
2014-05-07T02:22:36.759-0700: 17205.457: [Full GC [PSYoungGen: 16798K->0K(330304K)] [PSOldGen: 2775106K->580162K(2785280K)] 2791904K->580162K(3115584K) [PSPermGen: 181215K->181215K(262144K)], 3.1422615 secs] [Times: user=3.10 sys=0.02, real=3.14 secs] 
2014-05-07T02:45:44.317-0700: 18592.941: [Full GC [PSYoungGen: 45049K->0K(315392K)] [PSOldGen: 2769893K->624026K(2785280K)] 2814943K->624026K(3100672K) [PSPermGen: 200552K->200552K(262144K)], 3.4347254 secs] [Times: user=3.29 sys=0.00, real=3.44 secs] 
2014-05-07T03:08:18.628-0700: 19947.181: [Full GC [PSYoungGen: 5888K->0K(315392K)] [PSOldGen: 2783516K->419007K(2699456K)] 2789404K->419007K(3014848K) [PSPermGen: 218078K->169930K(262144K)], 11.5310136 secs] [Times: user=3.17 sys=0.20, real=11.53 secs] 
2014-05-07T03:34:53.638-0700: 21542.106: [Full GC [PSYoungGen: 13636K->0K(315392K)] [PSOldGen: 2691667K->554781K(2785280K)] 2705303K->554781K(3100672K) [PSPermGen: 180174K->180174K(262144K)], 2.9612961 secs] [Times: user=2.95 sys=0.02, real=2.96 secs] 
2014-05-07T03:48:06.871-0700: 22335.298: [Full GC [PSYoungGen: 45029K->0K(315392K)] [PSOldGen: 2757384K->645664K(2785280K)] 2802414K->645664K(3100672K) [PSPermGen: 191439K->191439K(262144K)], 4.0145124 secs] [Times: user=3.20 sys=0.05, real=4.01 secs] 
2014-05-07T03:52:48.829-0700: 22617.241: [Full GC [PSYoungGen: 9744K->0K(315392K)] [PSOldGen: 2773296K->575320K(2785280K)] 2783041K->575320K(3100672K) [PSPermGen: 199506K->199506K(262144K)], 3.2600004 secs] [Times: user=3.20 sys=0.00, real=3.26 secs] 
2014-05-07T03:57:47.110-0700: 22915.506: [Full GC [PSYoungGen: 10560K->0K(315392K)] [PSOldGen: 2771785K->436271K(2748608K)] 2782345K->436271K(3064000K) [PSPermGen: 215637K->170286K(262144K)], 6.1593551 secs] [Times: user=3.21 sys=0.05, real=6.16 secs] 
2014-05-07T04:05:46.130-0700: 23394.501: [Full GC [PSYoungGen: 45046K->0K(315392K)] [PSOldGen: 2699071K->624303K(2785280K)] 2744117K->624303K(3100672K) [PSPermGen: 182442K->182442K(262144K)], 3.2887751 secs] [Times: user=3.25 sys=0.00, real=3.29 secs] 
2014-05-07T04:09:00.646-0700: 23589.007: [Full GC [PSYoungGen: 45044K->0K(315392K)] [PSOldGen: 2716819K->709830K(2785280K)] 2761863K->709830K(3100672K) [PSPermGen: 187436K->187436K(262144K)], 3.4306116 secs] [Times: user=3.40 sys=0.00, real=3.43 secs] 
Thomas Mueller
  • 44,960
  • 12
  • 100
  • 124
  • 2
    Is the JVM heap being paged to disk? Can you try tweaking the survivor ratio? And could you start the JVM with `-XX:+PrintTenuringDistribution`? – nanofarad May 08 '14 at 10:29
  • Good questions! But I can't answer them. What would be the easiest way to find out whether the JVM heap being paged to disk? I could probably tweak the survivor ratio, but how would you tweak it and why? – Thomas Mueller May 08 '14 at 10:33
  • If the JVM heap is being paged crazily, in Windows Task Manager, if you select it to show the "page faults delta" and "page faults" columns on the processes tab, you would observe relatively large numbers for both. The survivor ratio can be decreased to prevent overflow back to the old generation, or increased to prevent sweeping a nearly-empty survivor space. Some experimentation can help in this case (trying 4 and 8 and comparing to 6). I'd imagine that the kernel time is high, therefore there's some paging or swapping (or potentially a lot of kernel-facilitated large copying) – nanofarad May 08 '14 at 10:37
  • @hexafraction Thanks a lot so far! I will update the question once I have more data. – Thomas Mueller May 08 '14 at 12:46
  • 1
    Scan you source code for "void finalize()"- are there any? What do they do? Do you need them? Why? – Java42 May 11 '14 at 19:52
  • @Java42 Why do you think `finalize` is the reason? (It's a lot of code, and I don't have the source code for most pieces.) – Thomas Mueller May 12 '14 at 07:22
  • `finalize` is one of the [usual suspects](http://stackoverflow.com/questions/2860121). In short: It's single theaded, forces to GC the object twice, and is a PITA anyway. You can either decompile the bytecode and grep or simply [load all classes](http://code.google.com/p/guava-libraries/wiki/ReflectionExplained#ClassPath) and use reflection to find the method. – maaartinus May 12 '14 at 08:59
  • 2
    A lot of code? No source? Come on - How were you going to implement a fix? What's the reason for this question? Your looking for root-cause with your hands tied. Not possible. – Java42 May 12 '14 at 11:58
  • @Java42 Primarily I'm searching for the root cause of the problem, and ways to analyze it. Not having the source code of libraries is quite common. And I have never heard that "finalize" causes full GC pauses of over 20 seconds (but just "real" time and not "sys" and "user"). But if you have more details what to analyze that would be great! – Thomas Mueller May 12 '14 at 13:50
  • If the problem is caused by "finalize", it should show up in full thread dumps I think. I do have many full thread dumps and will analyze them. – Thomas Mueller May 12 '14 at 13:55
  • @ThomasMueller - Also, check the CPU utilization on the problem system during the observed long GC cycles. High CPU % can also cause your symptom. Also, if your test machine is a guest running on some virtualized environment, cycle stealing (ie Steal-Time) can be an issue. – Java42 May 12 '14 at 20:28
  • @Java42 Originally the test was run in a virtualized environment, there the pauses were 80 seconds and more ("real" time). But the 20 seconds pauses are on a physical machine, where nothing else is running. – Thomas Mueller May 13 '14 at 08:39

2 Answers2

1

The problem turned out to be allocating more heap memory to the JVM than there is physical memory. In addition to running 3 JVMs, a MongoDB instance was also running on the same machine. The amount of physical memory was clearly lower than the allocated heap memory (java -Xmx...)

So the problem was virtual memory trashing.

Thomas Mueller
  • 44,960
  • 12
  • 100
  • 124
0

Unfortunately, this can happen pretty easily with the G1 garbage collector.

Thread and heap dumps are always a great way to figure out what's really going on with a system.

+1 to suggestion to check for finalize.

Information on driving toward a fix: Here are some VM options to change gc strategy or tune the existing strategy. Higher level Java 7 gc information.

Barett
  • 5,226
  • 6
  • 46
  • 53
  • Could you explain how and why this can happen with the G1 collector, or provide a link? I don't think the G1 collector is used here by the way, but I would still be interested in the details of your answer. – Thomas Mueller Jun 16 '14 at 07:38
  • Large heaps (32+ GB) can cause G1 to stall the VM for human-significant amounts of time. More great info: [Java G1 garbage collection in production](http://stackoverflow.com/q/2254041) Agreed, not sure G1 is being used. Pretty easy for you to find out. – Barett Jun 18 '14 at 05:23
  • The heap size is 3072 MB (`-Xmx3072m` above), and G1 is not used (we are using Java 6 or 7, where G1 is not the default, and `-XX:+UseG1GC` is not used). – Thomas Mueller Jun 19 '14 at 05:57