Questions tagged [apache-spark-standalone]

Use for question related to Apache Spark standalone deploy mode (not local mode).

This tag should be used for questions specific to Standalone deploy mode. Questions might include cluster orchestration in standalone mode, or standalone specific features and configuration options.

Spark standalone mode is alternative to running Spark on Mesos or YARN. Standalone mode provides a simpler alternative to using more sophisticated resource managers, which might be useful or applicable on dedicated Spark cluster (i.e. not running other jobs).

"Standalone" speaks to the nature of running "alone" without an external resource manager.

Related tags:

153 questions
3
votes
3 answers

Why Spark utilizing only one core per executor? How it decides to utilize cores other than number of partitions?

I am running spark in HPC environment on slurm using Spark standalone mode spark version 1.6.1. The problem is my slurm node is not fully used in the spark standalone mode. I am using spark-submit in my slurm script. There are 16 cores available on…
Laeeq
  • 282
  • 1
  • 4
  • 14
3
votes
4 answers

Spark master won't show running application in UI when I use spark-submit for python script

The image shows 8081 UI. The master shows running application when I start a scala shell or pyspark shell. But when I use spark-submit to run a python script, master doesn't show any running application. This is the command I used: spark-submit…
kaks
  • 669
  • 4
  • 10
  • 27
3
votes
1 answer

Forcing driver to run on specific slave in spark standalone cluster running with "--deploy-mode cluster"

I am running a small spark cluster, with two EC2 instances (m4.xlarge). So far I have been running the spark master on one node, and a single spark slave (4 cores, 16g memory) on the other, then deploying my spark (streaming) app in client…
Adam Dossa
  • 228
  • 1
  • 8
3
votes
1 answer

Is FAIR available for Spark Standalone cluster mode?

I'm having 2 node cluster with spark standalone cluster manager. I'm triggering more than one job using same sc with Scala multi threading.What I found is my jobs are scheduled one after another because of FIFO nature so I tried to use FAIR…
Balaji Reddy
  • 4,901
  • 3
  • 30
  • 43
3
votes
2 answers

How many executor processes run for each worker node in spark?

How many executors will be launched for each worker node in Spark? Can i know the math behind it? for example i have 6 worker nodes and 1 master and if i submit a job through spark-submit, how many maximum number of executors will be launched for…
AKC
  • 853
  • 4
  • 11
  • 35
3
votes
1 answer

java.lang.IllegalStateException: Cannot find any build directories

I want to run spark master and worker in Intellij. I have started the spark master and worker successfully. The worker is also connected to master without any problem. I can confirm this by looking at logs and spark web UI. But the problem starts…
3
votes
2 answers

how to run Spark job on specific nodes

For example my Spark cluster has 100 nodes(workers), when I run one job I just want it be ran on some 10 specific nodes, how should I achieve this. btw, I'm using Spark standalone module. Why Do I need the above requirement: One of my Spark job…
Jack
  • 4,626
  • 8
  • 46
  • 98
3
votes
3 answers

Continuously INFO JobScheduler:59 - Added jobs for time *** ms in my Spark Standalone Cluster

We are working with Spark Standalone Cluster with 8 Cores and 32GB Ram, with 3 nodes cluster with same configuration. Some times streaming batch completed in less than 1sec. some times it takes more than 10 secs at that time below log will appears…
3
votes
1 answer

Role of the Executors on the Spark master machine

In a Spark stand alone cluster, does the Master node run tasks as well? I wasn't sure if there Executors processes are spun up on the Master node and do work, alongside the Worker nodes. Thanks!
Ranjit Iyer
  • 797
  • 1
  • 8
  • 19
2
votes
1 answer

Spark Standalone : how to avoid sbt assembly and uber-jar?

I have sbt.build like that, to do Spark programming : libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "3.0.1" withSources(), "com.datastax.spark" %% "spark-cassandra-connector" % "3.0.0" withSources() ... ) As my program use…
2
votes
1 answer

Output Spark application name in driver log

I need to output the Spark application name (spark.app.name) in each line of the driver log (along with other attributes like message and date). So far I failed to find the correct log4j configuration or any other hints. How could it be done? I…
Valentina
  • 409
  • 6
  • 14
2
votes
1 answer

Apache Spark method not found sun.nio.ch.DirectBuffer.cleaner()Lsun/misc/Cleaner;

I encounter this problem while running an automated data processing script in spark-shell. First couple of iterations work fine, but it always sooner or later bumps into this error. I googled this issue but haven't found an exact match. Other…
2
votes
1 answer

spark.master configuration via REST job submission in standalone cluster is ignored

I have a Standalone spark cluster in HA mode (2 masters) and couple of workers registered there. I submitted the spark job via REST interface with following details, { "sparkProperties": { "spark.app.name": "TeraGen3", …
kans
  • 23
  • 3
2
votes
0 answers

Buffer/cache exhaustion Spark standalone inside a Docker container

I have a very weird memory issue (which is what a lot of people will most likely say ;-)) with Spark running in standalone mode inside a Docker container. Our setup is as follows: We have a Docker container in which we have a Spring boot…
2
votes
1 answer

Simple spark job fail due to GC overhead limit

I've created a standalone spark (2.1.1) cluster on my local machines with 9 cores / 80G each machine (total of 27 cores / 240G Ram) I've got a sample spark job that sum all the numbers from 1 to x this is the code : package com.example import…
Y. Eliash
  • 1,090
  • 1
  • 11
  • 20
1 2
3
10 11