10

Using Spark(1.6.1) standalone master, I need to run multiple applications on same spark master. All application submitted after first one, keep on holding 'WAIT' state always. I also observed, the one running holds all cores sum of workers. I already tried limiting it by using SPARK_EXECUTOR_CORES but its for yarn config, while I am running is "standalone master". I tried running many workers on same master but every time first submitted application consumes all workers.

Sankalp
  • 1,894
  • 5
  • 27
  • 37

4 Answers4

9

I was having same problem on spark standalone cluster.

What I got is, Somehow it is utilising all the resources for one single job. We need to define the resources so that their will be space to run other job as well.

Below is the command I am using to submit spark job.

bin/spark-submit --class classname --master spark://hjvm1:6066 --deploy-mode cluster  --driver-memory 500M --conf spark.executor.memory=1g --conf spark.cores.max=1 /data/test.jar
2

A crucial parameter for running multiple jobs in parallel on a Spark standalone cluster is spark.cores.max. Note that spark.executor.instances, num-executors and spark.executor.cores alone won't allow you to achieve this on Spark standalone, all your jobs except a single active one will stuck with WAITING status.

Spark-standalone resource scheduling:

The standalone cluster mode currently only supports a simple FIFO scheduler across applications. However, to allow multiple concurrent users, you can control the maximum number of resources each application will use. By default, it will acquire all cores in the cluster, which only makes sense if you just run one application at a time. You can cap the number of cores by setting spark.cores.max ...

MaxNevermind
  • 2,390
  • 1
  • 20
  • 28
1

I am assuming you run all the workers on one server and try to simulate a cluster. The reason for this assumption is that if otherwise you could use one worker and master to run Standalone Spark cluster.
The executor cores are something completely different compared to the normal cores. To set the number of executors you will need YARN to be turned on as you earlier said. The executor cores are the number of Concurrent tasks as executor can run (when using hdfs it is advisable to keep this below 5) [1].

The number of cores you want to limit to make the workers run are the “CPU cores”. These are specified in the configuration of Spark 1.6.1 [2]. In Spark there is the option to set the amount of CPU cores when starting a slave [3]. This happens with -c CORES, --cores CORES . Which defines the total CPU cores to allow Spark applications to use on the machine (default: all available); only on worker.

The command to start Spark would be something like this:

./sbin/start-all.sh --cores 2

Hope this helps

Paul Velthuis
  • 194
  • 2
  • 14
0

In the configuration settings add this line to "./conf/spark-env.sh " this file.

export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=1"

maximum cores now will limit to 1 for the master. if multiple spark application is running then it will use only one core for the master. By then defining the amount of workers and give the workers the setting:

export SPARK_WORKER_OPTS="-Dspark.deploy.defaultCores=1"

Each worker has then one core as well. Remember this has to be set for every worker in the configuration settings.

Paul Velthuis
  • 194
  • 2
  • 14
  • 1
    This answer is wrong. The goal of the question is to run in a cluster with "workers", this answer would work only for a local job. This answer only applies to the master running. The master will now only consume one core. SPARK_MASTER_OPTS Configuration properties that apply only to the master in the form "-Dx=y" (default: none). The worker should be adjusted with SPARK_WORKER_OPTS Configuration properties that apply only to the worker in the form "-Dx=y" (default: none). If in the worker the cores are set this answer would work. – Paul Velthuis Jun 01 '17 at 09:55