0

When I attempt to use SparkR on a Cloud Dataproc cluster (version 0.2) I get an error like the following:

Exception in thread "main" java.io.FileNotFoundException:
/usr/lib/spark/R/lib/sparkr.zip (Permission denied)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
at
org.apache.spark.deploy.RPackageUtils$.zipRLibraries(RPackageUtils.scala:215)
at
org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:371)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

How can I fix this so I can use SparkR?

James
  • 2,181
  • 11
  • 26

1 Answers1

4

This issue is due to a bug in the Spark 1.5 series (JIRA here). To fix this, run the following command on the master node either by SSHing into the master node or by using an initialization action.

sudo chmod 777 /usr/lib/spark/R/lib

This issue is supposed to be fixed in Spark 1.6 which Cloud Dataproc will eventually support in a new image version in the future.

James
  • 2,181
  • 11
  • 26