Highest Voted 'google-cloud-dataproc' Questions

58

votes

5 answers

What is the difference between Google Cloud Dataflow and Google Cloud Dataproc?

I am using Google Data Flow to implement an ETL data ware house solution. Looking into google cloud offering, it seems DataProc can also do the same thing. It also seems DataProc is little bit cheaper than DataFlow. Does anybody know the pros /…

google-cloud-platform google-cloud-dataflow google-cloud-dataproc

asked Sep 26 '17 at 22:36

KosiB

870
1
5
10

48

votes

6 answers

Google Cloud Platform: how to monitor memory usage of VM instances

I have recently performed a migration to Google Cloud Platform, and I really like it. However I can't find a way to monitor the memory usage of the Dataproc VM intances. As you can see on the attachment, the console provides utilization info about…

memory google-cloud-platform memory-management google-compute-engine google-cloud-dataproc

asked May 16 '17 at 01:40

Daniele B

16,703
21
96
154

16

votes

2 answers

Output from Dataproc Spark job in Google Cloud Logging

Is there a way to have the output from Dataproc Spark jobs sent to Google Cloud logging? As explained in the Dataproc docs the output from the job driver (the master for a Spark job) is available under Dataproc->Jobs in the console. There are two…

apache-spark google-cloud-dataproc google-cloud-logging

asked Dec 09 '15 at 18:38

Thomas Oldervoll

310
3
9

14

votes

3 answers

Where is the Spark UI on Google Dataproc?

What port should I use to access the Spark UI on Google Dataproc? I tried port 4040 and 7077 as well as a bunch of other ports I found using netstat -pln Firewall is properly configured.

apache-spark google-cloud-dataproc

asked Oct 18 '15 at 00:35

BAR

12,752
18
79
153

12

votes

2 answers

Which HBase connector for Spark 2.0 should I use?

Our stack is composed of Google Data Proc (Spark 2.0) and Google BigTable (HBase 1.2.0) and I am looking for a connector working with these versions. The Spark 2.0 and the new DataSet API support is not clear to me for the connectors I have…

scala apache-spark hbase google-cloud-dataproc google-cloud-bigtable

asked Dec 01 '16 at 11:00

ogen

712
1
6
20

12

votes

3 answers

While submit job with pyspark, how to access static files upload with --files argument?

for example, i have a folder: / - test.py - test.yml and the job is submited to spark cluster with: gcloud beta dataproc jobs submit pyspark --files=test.yml "test.py" in the test.py, I want to access the static file I uploaded. with…

python apache-spark pyspark google-cloud-dataproc

asked Jan 22 '16 at 05:19

lucemia

5,677
4
36
71

11

votes

2 answers

Error: permission denied on resource project when launching Dataproc cluster

I was successfully able to launch a dataproc cluster by manually creating one via gcloud dataproc clusters create.... However, when I try to launch one through a script (that automatically provisions and stops clusters), it says ERROR:…

google-cloud-platform google-cloud-dataproc

asked Sep 26 '17 at 19:26

claudiadast

289
5
16

11

votes

1 answer

Unable to connect Google Storage file using GSC connector from Spark

I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/dataproc/docs/connectors/cloud-storage I have set…

java apache-spark google-cloud-storage google-cloud-dataproc service-accounts

asked Sep 25 '17 at 14:28

Zebronix_777

335
3
23

11

votes

4 answers

spark.sql.crossJoin.enabled for Spark 2.x

I am using the 'preview' Google DataProc Image 1.1 with Spark 2.0.0. To complete one of my operations I have to complete a cartesian product. Since version 2.0.0 there has been a spark configuration parameter created (spark.sql.cross Join.enabled)…

apache-spark google-cloud-dataproc

asked Aug 17 '16 at 14:13

Stijn

439
1
8
18

11

votes

1 answer

BigQuery connector for pyspark via Hadoop Input Format example

I have a large dataset stored into a BigQuery table and I would like to load it into a pypark RDD for ETL data processing. I realized that BigQuery supports the Hadoop Input / Output…

apache-spark google-bigquery pyspark google-hadoop google-cloud-dataproc

asked Jul 14 '15 at 08:11

Luca Fiaschi

2,945
6
27
43

10

votes

3 answers

Guava version while using spark-shell

I'm trying to use the spark-cassandra-connector via spark-shell on dataproc, however I am unable to connect to my cluster. It appears that there is a version mismatch since the classpath is including a much older guava version from somewhere else,…

apache-spark spark-cassandra-connector google-cloud-dataproc

asked Dec 10 '15 at 18:40

SikhNerd

138
1
7

10

votes

1 answer

Incorrect memory allocation for Yarn/Spark after automatic setup of Dataproc Cluster

I'm trying to run Spark jobs on a Dataproc cluster, but Spark will not start due to Yarn being misconfigured. I receive the following error when running "spark-shell" from the shell (locally on the master), as well as when uploading a job through…

hadoop google-cloud-platform google-cloud-dataproc

asked Nov 08 '15 at 21:37

habitats

1,693
2
19
30

10

votes

2 answers

Dataproc + BigQuery examples - any available?

According to the Dataproc docos, it has "native and automatic integrations with BigQuery". I have a table in BigQuery. I want to read that table and perform some analysis on it using the Dataproc cluster that I've created (using a PySpark job). Then…

google-bigquery google-cloud-platform google-cloud-dataproc

asked Oct 06 '15 at 02:16

Graham Polley

12,512
3
32
64

10

votes

3 answers

"No Filesystem for Scheme: gs" when running spark job locally

I am running a Spark job (version 1.2.0), and the input is a folder inside a Google Clous Storage bucket (i.e. gs://mybucket/folder) When running the job locally on my Mac machine, I am getting the following error: 5932 [main] ERROR…

apache-spark hadoop google-cloud-storage google-cloud-dataproc google-hadoop

asked Jan 05 '15 at 15:41

Yaniv Donenfeld

525
2
7
25

8

votes

1 answer

How to install python packages in a Google Dataproc cluster

Is it possible to install python packages in a Google Dataproc cluster after the cluster is created and running? I tried to use "pip install xxxxxxx" in the master command line but it does not seem to work. Google's Dataproc documentation does not…

python google-cloud-platform google-compute-engine google-cloud-dataproc

asked May 10 '18 at 19:07

Pablo Brenner

143
1
8

Questions tagged [google-cloud-dataproc]