Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

153 questions
78
votes
4 answers

Which cluster type should I choose for Spark?

I am new to Apache Spark, and I just learned that Spark supports three types of cluster: Standalone - meaning Spark will manage its own cluster YARN - using Hadoop's YARN resource manager Mesos - Apache's dedicated resource manager project I think…
David S.
  • 9,721
  • 9
  • 53
  • 95
75
votes
4 answers

What is the relationship between workers, worker instances, and executors?

In Spark Standalone mode, there are master and worker nodes. Here are few questions: Does 2 worker instance mean one worker node with 2 worker processes? Does every worker instance hold an executor for specific application (which manages storage,…
edwardsbean
  • 3,147
  • 4
  • 19
  • 25
40
votes
3 answers

Apache Spark: Differences between client and cluster deploy modes

TL;DR: In a Spark Standalone cluster, what are the differences between client and cluster deploy modes? How do I set which mode my application is going to run on? We have a Spark Standalone cluster with three machines, all of them with Spark…
Daniel de Paula
  • 15,304
  • 8
  • 62
  • 69
20
votes
2 answers

Understand Spark: Cluster Manager, Master and Driver nodes

Having read this question, I would like to ask additional questions: The Cluster Manager is a long-running service, on which node it is running? Is it possible that the Master and the Driver nodes will be the same machine? I presume that there…
Rami
  • 6,898
  • 16
  • 61
  • 98
13
votes
3 answers

What happens when Spark master fails?

Does the driver need constant access to the master node? Or is it only required to get initial resource allocation? What happens if master is not available after Spark context has been created? Does it mean application will fail?
user6022341
10
votes
4 answers

How do I run multiple spark applications in parallel in standalone master

Using Spark(1.6.1) standalone master, I need to run multiple applications on same spark master. All application submitted after first one, keep on holding 'WAIT' state always. I also observed, the one running holds all cores sum of workers. I…
Sankalp
  • 1,894
  • 5
  • 27
  • 37
9
votes
1 answer

Spark Standalone Number Executors/Cores Control

So I have a spark standalone server with 16 cores and 64GB of RAM. I have both the master and worker running on the server. I don't have dynamic allocation enabled. I am on Spark 2.0 What I dont understand is when I submit my job and…
theMadKing
  • 1,924
  • 4
  • 27
  • 56
7
votes
0 answers

Spark Streaming - Stopped worker throws FileNotFoundException

I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit…
7
votes
3 answers

Running Spark driver program in Docker container - no connection back from executor to the driver?

UPDATE: The problem is resolved. The Docker image is here: docker-spark-submit I run spark-submit with a fat jar inside a Docker container. My standalone Spark cluster runs on 3 virtual machines - one master and two workers. From an executor log on…
tashoyan
  • 338
  • 2
  • 9
7
votes
1 answer

How to make Spark driver resilient to Master restarts?

I have a Spark Standalone (not YARN/Mesos) cluster and a driver app running (in client mode), which talks to that cluster to execute its tasks. However, if I shutdown and restart the Spark master and workers, the driver does not reconnect to the…
dOxxx
  • 1,534
  • 12
  • 27
6
votes
1 answer

Spark workers stopped after driver commanded a shutdown

Basically, Master node also perform as a one of the slave. Once slave on master completed it called the SparkContext to stop and hence this command propagate to all the slaves which stop the execution in mid of the processing. Error log in one of…
pooshan Singh
  • 71
  • 1
  • 5
5
votes
0 answers

Spark Standalone vs YARN

What features of YARN make it better than Spark Standalone mode for multi-tenant cluster running only Spark applications? Maybe besides authentication. There are a lot of answers at Google, pretty much of them sounds wrong to me, so I'm not sure…
VB_
  • 43,322
  • 32
  • 111
  • 238
5
votes
3 answers

PySpark: Not able to create SparkSession.(Java Gateway Error)

I have installed PySpark on windows and was having no problem till yesterday. I am using windows 10, PySpark version 2.3.3(Pre-build version), java version "1.8.0_201". Yesterday when I tried creating a spark session, I ran into below…
5
votes
0 answers

Spark Streaming - Block replication policy issue in case of multiple executor on the same worker

I am running a spark streaming application on a cluster composed by three nodes, each node with a worker and three executors (so a total of 9 executors). I am using Spark version 2.3.2, and the spark standalone cluster manager. The…
5
votes
0 answers

Spark streaming job exited abruptly - RECEIVED SIGNAL TERM

The running spark streaming job, which is supposed to run continuously, exited abruptly with the following error (found in the executor logs): 2017-07-28 00:19:38,807 [SIGTERM handler] ERROR…
1
2 3
10 11