Questions tagged [google-cloud-dataproc]

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform. The service provides GUI, CLI and HTTP API access modes for deploying/managing clusters and submitting jobs onto clusters.

Google Cloud Dataproc is a managed Hadoop MapReduce, Spark, Pig and Hive service on Google Cloud Platform. The service provides GUI, CLI and HTTP API access modes for deploying/managing clusters and submitting jobs onto clusters. This tag can be added to any questions related to using/troubleshooting Google Cloud Dataproc.

Useful links:

1136 questions
0
votes
2 answers

Google Cloud Dataproc migration to Spark 1.6.0

Will Google Dataproc start using Spark 1.6.0 anytime soon ? I'm creating a cluster using this command: gcloud beta dataproc clusters create and it defaults to using Spark 1.5.2. Thanks.
femibyte
  • 2,585
  • 4
  • 27
  • 51
0
votes
1 answer

SparkR on Dataproc (Spark 1.5.x) does not work

When I attempt to use SparkR on a Cloud Dataproc cluster (version 0.2) I get an error like the following: Exception in thread "main" java.io.FileNotFoundException: /usr/lib/spark/R/lib/sparkr.zip (Permission denied) at…
James
  • 2,181
  • 11
  • 26
0
votes
1 answer

Accessing Cassandra from Google Cloud Dataproc

I just set up a Spark cluster in Google Cloud using DataProc and I have a standalone installation of Cassandra running on a separate VM. I would like to install the Datastax spark-cassandra connector so I can connect to Cassandra from spark. How can…
femibyte
  • 2,585
  • 4
  • 27
  • 51
0
votes
1 answer

Cloud Dataproc API discovery

I want to programmatically build against the Cloud Dataproc API but cannot find the discovery service. Where can I find it?
James
  • 2,181
  • 11
  • 26
0
votes
1 answer

Using Presto on Cloud Dataproc with Google Cloud SQL?

I use both Hive and MySQL (via Google Cloud SQL) and I want to use Presto to connect to both easily. I have seen there is a Presto initialization action for Cloud Dataproc but it does not work with Cloud SQL out of the box. How can I get that…
James
  • 2,181
  • 11
  • 26
0
votes
2 answers

How can I run Presto on Google Cloud Dataproc?

I want to run Presto on a Dataproc instance or on Google Cloud Platform in general. How can I easily setup and install Presto, especially with Hive?
James
  • 2,181
  • 11
  • 26
0
votes
1 answer

java.lang.VerifyError when using S3 connector with Cloud Dataproc

I am trying to use the S3 connector with Google Cloud Dataproc and I am encountering a java.lang.VerifyError. This seems to occur on a brand new cluster which I have not modified. Here is an example: $ hadoop fs -ls s3:/// Exception in…
0
votes
1 answer

Service account errors when using Cloud Dataproc

It seems like using service accounts with Dataproc is not possible because when using a service account with Cloud Dataproc I'm getting permission errors. For example, running the command gcloud beta dataproc clusters list yeilds an error which says…
James
  • 2,181
  • 11
  • 26
-1
votes
1 answer

Pyspark crashing on Dataproc cluster for small dataset

I am running a jupyter notebook created on a gcp dataproc cluster consisting of 3 worker nodes and 1 master node of type "N1-standard2" (2 cores, 7.5GB RAM), for my data science project. The dataset consists of ~0.4 mn rows. I have called a groupBy…
-2
votes
1 answer

dataproc rename files written by spark in GCS folder

I am using Dataproc to implement spark jobs using Scala. The aim of my spark job is to read data in GCS make some transformations and then write result data under GCS. The files we got from spark write are PART-00 , I want to rename them, but I…
scalacode
  • 759
  • 9
  • 20
-2
votes
1 answer

How can i pass config file parameters in Google Cloud Platform Spark Scala jobs?

I have a Spark Scala job deployed on GCP Dataproc cluster. How can I pass config file as a parameter to the Spark Submit query using the Web UI?
Chetan SP
  • 172
  • 2
  • 10
1 2 3
75
76