How to set Apache Spark Executor memory

Question

How can I increase the memory available for Apache spark executor nodes?

I have a 2 GB file that is suitable to loading in to Apache Spark. I am running apache spark for the moment on 1 machine, so the driver and executor are on the same machine. The machine has 8 GB of memory.

When I try count the lines of the file after setting the file to be cached in memory I get these errors:

2014-10-25 22:25:12 WARN  CacheManager:71 - Not enough space to cache partition rdd_1_1 in memory! Free memory is 278099801 bytes.

I looked at the documentation here and set spark.executor.memory to 4g in $SPARK_HOME/conf/spark-defaults.conf

The UI shows this variable is set in the Spark Environment. You can find screenshot here

However when I go to the Executor tab the memory limit for my single Executor is still set to 265.4 MB. I also still get the same error.

I tried various things mentioned here but I still get the error and don't have a clear idea where I should change the setting.

I am running my code interactively from the spark-shell

score 203 · Accepted Answer · edited Dec 24 '19 at 02:58

Since you are running Spark in local mode, setting spark.executor.memory won't have any effect, as you have noticed. The reason for this is that the Worker "lives" within the driver JVM process that you start when you start spark-shell and the default memory used for that is 512M. You can increase that by setting spark.driver.memory to something higher, for example 5g. You can do that by either:

setting it in the properties file (default is $SPARK_HOME/conf/spark-defaults.conf),
```
spark.driver.memory              5g
```
or by supplying configuration setting at runtime
```
$ ./bin/spark-shell --driver-memory 5g
```

Note that this cannot be achieved by setting it in the application, because it is already too late by then, the process has already started with some amount of memory.

The reason for 265.4 MB is that Spark dedicates spark.storage.memoryFraction * spark.storage.safetyFraction to the total amount of storage memory and by default they are 0.6 and 0.9.

512 MB * 0.6 * 0.9 ~ 265.4 MB

So be aware that not the whole amount of driver memory will be available for RDD storage.

But when you'll start running this on a cluster, the spark.executor.memory setting will take over when calculating the amount to dedicate to Spark's memory cache.

@Chuck http://spark.apache.org/docs/latest/configuration.html#memory-management "Amount of memory to use for the driver process, i.e. where SparkContext is initialized, in the same format as JVM memory strings with a size unit suffix ("k", "m", "g" or "t") (e.g. 512m, 2g)." — James Moore, Jul 14 '20 at 16:26

score 43 · Answer 2 · answered Dec 12 '14 at 13:47

Also note, that for local mode you have to set the amount of driver memory before starting jvm:

bin/spark-submit --driver-memory 2g --class your.class.here app.jar

This will start the JVM with 2G instead of the default 512M.
Details here:

For local mode you only have one executor, and this executor is your driver, so you need to set the driver's memory instead. *That said, in local mode, by the time you run spark-submit, a JVM has already been launched with the default memory settings, so setting "spark.driver.memory" in your conf won't actually do anything for you. Instead, you need to run spark-submit as follows

score 8 · Answer 3 · answered May 03 '19 at 06:17

The answer submitted by Grega helped me to solve my issue. I am running Spark locally from a python script inside a Docker container. Initially I was getting a Java out-of-memory error when processing some data in Spark. However, I was able to assign more memory by adding the following line to my script:

conf=SparkConf()
conf.set("spark.driver.memory", "4g")

Here is a full example of the python script which I use to start Spark:

import os
import sys
import glob

spark_home = '<DIRECTORY WHERE SPARK FILES EXIST>/spark-2.0.0-bin-hadoop2.7/'
driver_home = '<DIRECTORY WHERE DRIVERS EXIST>'

if 'SPARK_HOME' not in os.environ:
    os.environ['SPARK_HOME'] = spark_home 

SPARK_HOME = os.environ['SPARK_HOME']

sys.path.insert(0,os.path.join(SPARK_HOME,"python"))
for lib in glob.glob(os.path.join(SPARK_HOME, "python", "lib", "*.zip")):
    sys.path.insert(0,lib);

from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext

conf=SparkConf()
conf.set("spark.executor.memory", "4g")
conf.set("spark.driver.memory", "4g")
conf.set("spark.cores.max", "2")
conf.set("spark.driver.extraClassPath",
    driver_home+'/jdbc/postgresql-9.4-1201-jdbc41.jar:'\
    +driver_home+'/jdbc/clickhouse-jdbc-0.1.52.jar:'\
    +driver_home+'/mongo/mongo-spark-connector_2.11-2.2.3.jar:'\
    +driver_home+'/mongo/mongo-java-driver-3.8.0.jar') 

sc = SparkContext.getOrCreate(conf)

spark = SQLContext(sc)

score 6 · Answer 4 · answered Oct 05 '16 at 17:29

6

Apparently, the question never says to run on local mode not on yarn. Somehow I couldnt get spark-default.conf change to work. Instead I tried this and it worked for me

bin/spark-shell --master yarn --num-executors 6  --driver-memory 5g --executor-memory 7g

( couldnt bump executor-memory to 8g there is some restriction from yarn configuration.)

answered Oct 05 '16 at 17:29

Somum

1,982
23
12

The OP does mention that he is using a single machine. – Sharique Abdullah Mar 21 '18 at 03:40
You make the `executor-memory` higher than `driver-memory`? – nimeresam Dec 24 '19 at 15:07

score 4 · Answer 5 · answered Nov 15 '17 at 14:30

4

You need to increase the driver memory.On mac(i.e when running on local master), the default driver-memory is 1024M). By default, thus 380Mb is allotted to the executor.

Upon increasing [--driver-memory 2G], executor memory got increased to ~950Mb.

answered Nov 15 '17 at 14:30

Sanchay

823
12
27

score 3 · Answer 6 · answered Jul 05 '20 at 07:24

As far as i know it wouldn't be possible to change the spark.executor.memory at run time. If you are running a stand-alone version, with pyspark and graphframes, you can launch the pyspark REPL by executing the following command:

pyspark --driver-memory 2g --executor-memory 6g --packages graphframes:graphframes:0.7.0-spark2.4-s_2.11

Be sure to change the SPARK_VERSION environment variable appropriately regarding the latest released version of Spark

score 2 · Answer 7 · edited Nov 15 '17 at 15:11

2

create a file called spark-env.sh in spark/conf directory and add this line

SPARK_EXECUTOR_MEMORY=2000m #memory size which you want to allocate for the executor

edited Nov 15 '17 at 15:11

Sanchay

823
12
27

answered Apr 14 '17 at 07:47

Mohamed Thasin ah

8,314
8
35
66

1

Exactly, I run the master with concrete config, I wouldn't need to add options everytime I run a spark command. But this is only for cluster node, in case it's standalone the setting is `SPARK_WORKER_MEMORY`. – Evhz May 23 '18 at 09:43

score 2 · Answer 8 · edited Nov 27 '19 at 09:30

2

spark-submit \

  --class org.apache.spark.examples.SparkPi \

  --master yarn \

  --deploy-mode cluster \  # can be client for client mode

  --executor-memory 2G \

  --num-executors 5 \

  /path/to/examples.jar \

  1000

edited Nov 27 '19 at 09:30

aviboy2006

5,856
4
19
35

answered Nov 27 '19 at 08:52

keven

21
1

vaquar khan · Answer 9 · 2017-10-06T04:14:43.820

1

You can build command using following example

 spark-submit    --jars /usr/share/java/postgresql-jdbc.jar    --class com.examples.WordCount3  /home/vaquarkhan/spark-scala-maven-project-0.0.1-SNAPSHOT.jar --jar  --num-executors 3 --driver-memory 10g **--executor-memory 10g** --executor-cores 1  --master local --deploy-mode client  --name wordcount3 --conf "spark.app.id=wordcount"

edited Oct 06 '17 at 04:14

answered Jul 13 '17 at 18:04

vaquar khan

8,163
4
54
73

score 1 · Answer 10 · edited Feb 06 '18 at 08:17

Spark executor memory is required for running your spark tasks based on the instructions given by your driver program. Basically, it requires more resources that depends on your submitted job.

Executor memory includes memory required for executing the tasks plus overhead memory which should not be greater than the size of JVM and yarn maximum container size.

Add the following parameters in spark-defaults.conf

spar.executor.cores=1

spark.executor.memory=2g

If you using any cluster management tools like cloudera manager or amabari please refresh the cluster configuration for reflecting the latest configs to all nodes in the cluster.

Alternatively, we can pass the executor core and memory value as an argument while running spark-submit command along with class and application path.

Example:

spark-submit \

  --class org.apache.spark.examples.SparkPi \

  --master yarn \

  --deploy-mode cluster \  # can be client for client mode

  --executor-memory 2G \

  --num-executors 5 \

  /path/to/examples.jar \

  1000

score 1 · Answer 11 · answered Oct 07 '18 at 07:47

you mentioned that you are running yourcode interactivly on spark-shell so, while doing if no proper value is set for driver-memory or executor memory then spark defaultly assign some value to it, which is based on it's properties file(where default value is being mentioned).

I hope you are aware of the fact that there is one driver(master node) and worker-node(where executors are get created and processed), so basically two types of space is required by the spark program,so if you want to set driver memory then when start spark-shell .

spark-shell --driver-memory "your value" and to set executor memory : spark-shell --executor-memory "your value"

then I think you are good to go with the desired value of the memory that you want your spark-shell to use.

score 1 · Answer 12 · edited Feb 09 '20 at 18:20

1

In Windows or Linux, you can use this command:

spark-shell --driver-memory 2G

edited Feb 09 '20 at 18:20

Paul Roub

35,100
27
72
83

answered Feb 09 '20 at 18:17

Robert David Ramírez Garcia

21
2

How to set Apache Spark Executor memory

12 Answers12

Linked

Related