0

Because my computational tasks require fast disk I/O, I am interested in mounting large RAM disks on each worker node in a YARN cluster that runs Spark, and am thus wondering how the YARN cluster manager handles the memory occupied by such a RAM disk.

If I were to allocate 32GB to a RAM disk on each 128GB RAM machine, for example, would the YARN cluster manager know how to allocate RAM so as to avoid over allocating memory when performing tasks (in this case, does YARN of RAM to the requisitioned tasks, or at most only 96GB)?

If so, is there any way to indicate to the YARN cluster manager that a RAM disk is present and so, a specific partition of the RAM is off limits to YARN? Will Spark know about these constraints either?

Han Altae-Tran
  • 253
  • 1
  • 8

1 Answers1

0

In Spark configurations you can set driver and executors configs like cores and memory allocation amount. Moreover, when you use yarn as the resource manager there is some extra configs supported by it you can help you to manage the cluster resources better. "spark.driver.memoryOverhead" or "spark.yarn.am.memoryOverhead" which is the amount of off-heap space with the default value of

AM memory * 0.10, with minimum of 384

for further information here is the link.

  • But isn't that AM memory for the YARN application manager? I want to make sure that both YARN and Spark know that it can't touch that partition of memory, so as to not reset the RAM Disk – Han Altae-Tran Jun 30 '18 at 21:48
  • yeah, that's a yarn configuration. But what do you mean with reset the RAM Disk? – Amin Heydari Alashti Jul 01 '18 at 03:35
  • If I have a RAM Disk installed (taking up 32GB), will YARN allocate that 32GB of RAM to a Spark context, thus pushing the RAM Disk off RAM? – Han Altae-Tran Jul 01 '18 at 15:43