3

I have a binary that contains a list of short strings which is loaded on startup and stored in memory as a map from string to protobuf (that contains the string..). (Not ideal, but hard to change that design due to legacy issues) Recently that list has grown from ~2M to ~20M entries causing it to fail when constructing the map.

First I got OutOfMemoryError: Java heap space.

When I increased the heap size using the xms and xmx we ran into GC overhead limit exceeded.

Runs on a Linux 64-bit machine with 15GB available memory and the following JVM args (I increased the RAM 10G->15G and the heap flags 6000M -> 9000M):

-Xms9000M -Xmx9000M -XX:PermSize=512m -XX:MaxPermSize=2018m

This binary does a whole lot of things and is serving live traffic so I can't afford it being occasionally stuck.

Edit: I eventually went and did the obvious thing, which is fixing the code (change from HashMap to ImmutableSet) and adding more RAM (-Xmx11000M).

Shalev Manor
  • 111
  • 6
  • https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/memleaks002.html. Of course you're running on a 64-bit operating system with a 64-bit JVM. How much physical RAM is installed and available? What else runs on this server? – duffymo Feb 23 '16 at 10:01
  • You have aptly described the problem. I think you should show more of what you have tried or solution directions you are considering, and formulate a clear question. – Adriaan Koster Feb 23 '16 at 10:01
  • There's a pretty good explanation here: https://plumbr.eu/outofmemoryerror/gc-overhead-limit-exceeded – Boris Pavlović Feb 23 '16 at 10:02
  • @duffymo added more context – Shalev Manor Feb 23 '16 at 10:35
  • You are assuming this data structure is the issue. Get a profiler and measure it to be sure. I'd wonder why you have to have all that data in memory. Isn't this what databases are for? A solution like this can't scale horizontally. – duffymo Feb 23 '16 at 10:42
  • It works when I revert to the old 2M list, so it's pretty clear that is the issue. That's a valid point. When this solution was designed it was not foreseen the list would grow to that size. I'm looking for a temporary solution if that's possible until we have a more scalable one. – Shalev Manor Feb 23 '16 at 10:54
  • Have a look at this SE question: http://stackoverflow.com/questions/5839359/java-lang-outofmemoryerror-gc-overhead-limit-exceeded/34296665#34296665. Can you post complete VM parameter list? – Ravindra babu Feb 23 '16 at 11:59
  • and one more question: http://stackoverflow.com/questions/1393486/error-java-lang-outofmemoryerror-gc-overhead-limit-exceeded/35244518#35244518 – Ravindra babu Feb 23 '16 at 12:07
  • you should enable GC logging and then post them here or run them through GCViewer. It's possible that the heap is still too small or the young-old boundary is suboptimally placed due to the large initial heap size. – the8472 Feb 23 '16 at 12:31
  • @ravindra There are no more args. I read through those, main advice there is to increase heap size, which doesn't seem to help. – Shalev Manor Feb 23 '16 at 12:44
  • Then you have to analyse with profiling tools like MAT or visualvm. Do you need huge live object data in oldgen or is it a memory leak? – Ravindra babu Feb 23 '16 at 12:57
  • you should quantify the data better. _20M * x_ tells us nothing if we do not know the _x_. If the data increases 10-fold then in the worst case you need at least 10 times as large a heap if there was little safety margin before. – the8472 Feb 23 '16 at 13:35

1 Answers1

2

I'm looking for a temporary solution if that's possible until we have a more scalable one.

First, you need to figure out if the "OOME: GC overhead limit exceeded" is due to the heap being:

  • too small ... causing the JVM to do repeated Full GCs, or

  • too large ... causing the JVM to thrash the virtual memory when a Full GC is run.

You should be able to distinguish these two cases by turning on and examining the GC logs, and using OS-level monitoring tools to check for excessive paging loads. (When checking the paging levels, also check that the problem isn't due to competition for RAM between your JVM and another memory-hungry application.)

If the heap is too small, try making it bigger. If it is too big, make it smaller. If you system is showing both symptoms ... then you have a big problem.

You should also check that "compressed oops" is enabled for your JVM, as that will reduce your JVM's memory footprint. The -XshowSettings option lists the settings in effect when the JVM starts. Use -XX:+UseCompressedOops to enable compressed oops if they are disabled.

(You will probably find that compressed oops are enabled by default, but it is worth checking. This would be an easy fix ... )

If none of the above work, then your only quick fix is to get more RAM.


But obviously, the real solution is to reengineer the code so that you don't need a huge (and increasing over time) in-memory data structure.

Stephen C
  • 632,615
  • 86
  • 730
  • 1,096