1

enter image description hereI'm looking the heap memory graph in Flink and I see that the value of heap memory always grow. When the GC active itself? are there a classes of Flink for handle the GC?

Giuseppe17
  • 45
  • 5

2 Answers2

1

I assume you are doing some stateful operations in your flink app. This will lead to state being managed by Flink which will exist on heap. If you don't clear the state that is no more relevant, it will keep growing and eventually crash JVM.

This state will not be GCed as it is critical to your app.

Gaurav Kumar
  • 1,026
  • 12
  • 26
  • i use jconsole for see this and it have a button for active GC and it clean the heap memory. Obviously it only cleans the heap memory that Flink no longer needs. I use java8 and I have read that it use a parallel GC of default. Can Flink modify the type of GC? – Giuseppe17 Nov 28 '19 at 09:35
0

The garbage collector of Java is hard to control from the application. If the machine is not idling, GC may only happen when it is actually reaching the heap limit.

From your screenshot, I can see that GC is probably never invoked. I actually don't even see any issue at all. If you don't want java to take more than 50 MB RAM, you should set your Xmx accordingly. Then GC will be invoked when it hits that barrier.

Just as an excursion to Java land: When an object is not used anymore, the memory is not freed immediately. Only when a GC is invoked, it's possible to reclaim that memory. You gave your Java VM 2 GB of your RAM, so it thinks that it can use that 2 GB fully without causing any issues. To improve performance, GC is invoked as rarely as possible. So it may opt to not run GC at all if you are that far away from the limit.

Java's GC are constantly improved and Java 8 is quite old. Newer versions may be more aggressive and you may actually see a different behavior on Java 13. You can set the GC directly in your JVM_ARGS. But I don't see any need for that.

As Gaurav Kumar pointed out, it may also be inherently to your Flink application that some objects are never freed because they are crucial (state).

However, I don't see anything wrong in what you provided. I'm guessing that you have other concerns that you haven't shared so far. Could you maybe rephrase your question in a way that reflects your thoughts behind this initial question?

Arvid Heise
  • 3,019
  • 2
  • 10
  • yes, I use java8, Flink 1.7.1. I have a one source: kafka topic; I'm in a streaming mode. I use exactly once and I have fix the interval of checkpoint at 12 sec. I use jconsole for see this result. – Giuseppe17 Nov 28 '19 at 09:45
  • I doubt that newer Java version will waste more CPU cycles on cleaning up a heap with less than 3% usage. And after the “excursion to Java land”, it’s worth noting that other programming languages do not actually free immediately either. There’s just a different kind of bookkeeping that allows you to plot better looking graphs without any true consequences to the system. – Holger Nov 28 '19 at 12:40
  • no problem, i'd like understand how it works. is the 1,8 Gb the memory of default of jdk? Or maybe, do it depend on the Flink parameters? – Giuseppe17 Nov 28 '19 at 13:42
  • It depends on how you execute Flink. If you start a cluster, [Flink's default is 1 GB for task and jobmanager](https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html). If you run in your IDE, it depends on your your [java version, java implementation, and hardware](https://stackoverflow.com/questions/28272923/default-xmxsize-in-java-8). – Arvid Heise Nov 28 '19 at 19:05
  • I start it in a cluster. there are 3 processes: 1. job management process; 2. task manager process; 3. application process. process 1 and 2 have each one 1 Gb of memory, the 3th has 1.8 Gb, why? what does 1.8 Gb depend on? This picture rappresent the heap memory usage of the 3th process, why it is so linear? – Giuseppe17 Nov 28 '19 at 19:43
  • You can influence the heap size of the cli client, by setting `JVM_ARGS` appropriately. First check what's currently in the variable and then add or replace `-Xmx=.5g` if you want that process to take only half a GB. – Arvid Heise Nov 29 '19 at 07:48
  • @Holger it seems you could force Shenandoah into something like that, via `ShenandoahGCHeuristics=adaptive` (and `ShenandoahMinFreeThreshold` implicitly) – Eugene Nov 30 '19 at 19:48