10

I'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured.

I receive the following error when running "spark-shell" from the shell (locally on the master), as well as when uploading a job through the web-GUI and the gcloud command line utility from my local machine:

15/11/08 21:27:16 ERROR org.apache.spark.SparkContext: Error initializing     SparkContext.
java.lang.IllegalArgumentException: Required executor memory (38281+2679 MB) is above the max threshold (20480 MB) of this cluster! Please increase the value of 'yarn.s
cheduler.maximum-allocation-mb'.

I tried modifying the value in /etc/hadoop/conf/yarn-site.xml but it didn't change anything. I don't think it pulls the configuration from that file.

I've tried with multiple cluster combinations, at multiple sites (mainly Europe), and I only got this to work with the low memory version (4-cores, 15 gb memory).

I.e. this is only a problem on the nodes configured for memory higher than the yarn default allows.

habitats
  • 1,693
  • 2
  • 19
  • 30

1 Answers1

10

Sorry about these issues you're running into! It looks like this is part of a known issue where certain memory settings end up computed based on the master machine's size rather than the worker machines' size, and we're hoping to fix this in an upcoming release soon.

There are two current workarounds:

  1. Use a master machine type with memory either equal to or smaller than worker machine types.
  2. Explicitly set spark.executor.memory and spark.executor.cores either using the --conf flag if running from an SSH connection like:

    spark-shell --conf spark.executor.memory=4g --conf spark.executor.cores=2
    

    or if running gcloud beta dataproc, use --properties:

    gcloud beta dataproc jobs submit spark --properties spark.executor.memory=4g,spark.executor.cores=2
    

You can adjust the number of cores/memory per executor as necessary; it's fine to err on the side of smaller executors and letting YARN pack lots of executors onto each worker, though you can save some per-executor overhead by setting spark.executor.memory to the full size available in each YARN container and spark.executor.cores to all the cores in each worker.

EDIT: As of January 27th, new Dataproc clusters will now be configured correctly for any combination of master/worker machine types, as mentioned in the release notes.

Dennis Huo
  • 9,799
  • 20
  • 41
  • 1
    Holy moly! Master memory size less than worker was probably the only combination I had not yet tried. Thanks a lot! It worked like a charm:) – habitats Nov 08 '15 at 22:44
  • It looks like this change introduced a new problem. I get [the following error](http://i.imgur.com/5UVnFJP.png) when running on this new configuration: . Low memory cluster has no problems with the identical .jar-file job. – habitats Nov 08 '15 at 23:35
  • Should I post it as a new problem? – habitats Nov 08 '15 at 23:37
  • Yeah, probably best to post as a new question, more people will look at it that way. – Dennis Huo Nov 09 '15 at 01:38
  • Since you provided the job id in the screenshot, I went ahead and checked your settings; it looks like it's another manifestation of the same problem in that if you use a n1-highcpu-16 master with n1-highmem-4 workers, there's a bug right now which sets spark.executor.cores=8. If you set up the [socksproxy](https://cloud.google.com/dataproc/cluster-web-interfaces#connecting_to_the_web_interfaces) and visit :8088 and then click through to a failed Spark application, you'll see an error like: – Dennis Huo Nov 09 '15 at 02:08
  • Uncaught exception: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=8, maxVirtualCores=4 – Dennis Huo Nov 09 '15 at 02:08
  • In the future we should have all these settings just work out of the box; in the meantime, if you still need more cpus on the master than workers, and you've chosen machine types so that the highcpu master has less/equal memory compared to highmem workers, you should also provide the `--conf spark.executor.cores=2` setting to give each executor half the number of cores available on each worker (or `--properties spark.executor.cores=2` if using `gcloud beta dataproc jobs`; in this case the flag has to come *before* the jarfile in the arguments) – Dennis Huo Nov 09 '15 at 02:11
  • This worked! Again thanks a whole lot for the help. You made my day:) – habitats Nov 09 '15 at 06:49
  • 1
    Apologies for poking at an old question but the [newest release](https://cloud.google.com/dataproc/release-notes/service) for Google Cloud Dataproc has a fix for this issue and also calls out this question. Cheers! – James Feb 01 '16 at 21:54