Highest Voted 'google-cloud-dataproc' Questions

0

votes

2 answers

Proxying Resource Manager in Google Dataproc

I've followed Google instructions on this. gcloud compute ssh --zone=us-central1-b --ssh-flag="-D 8088" --ssh-flag="-N" --ssh-flag="-n" spark-test-m followed by /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome…

apache-spark yarn google-cloud-dataproc

asked Mar 24 '16 at 22:54

J.Fratzke

1,067
12
19

0

votes

1 answer

Scheduled mapreduce job on Google Cloud Platform

I'm developing a node.js application that basically stores user event logs in a database and shows insights about user actions. For achieving this event logs must be analyzed by using a Mapreduce job which would run once a day automatically (every…

hadoop mapreduce google-bigquery google-cloud-platform google-cloud-dataproc

asked Mar 22 '16 at 13:43

Endymion

752
1
10
22

0

votes

0 answers

Error Installing Oozie on Dataproc

I was first using a Dataproc initialization script provided by Google (here) to install Oozie on a new cluster and noticed that I couldn't hit the UI or run jobs on the command line. Diagnosing I went ahead and deleted the cluster then recreated a…

hadoop oozie google-cloud-dataproc

asked Mar 21 '16 at 20:23

Khirok

507
1
12
19

0

votes

2 answers

Dataproc + python package: Distribute updated versions

Currently I am developing a Spark application on Google DataProc. Frequently, I need to update the Python package. During provisioning I run the following commands: echo "Downloading and extracting source code..." gsutil cp…

python packaging google-cloud-dataproc

asked Mar 12 '16 at 17:11

Frank

387
1
12

0

votes

1 answer

Apache Spark job runs locally but throwing null pointer on Google Cloud Cluster

I have an application for Apache Spark that I have been until now running/testing on local machine using command: spark --class "main.SomeMainClass" --master local[4] jarfile.jar And everything runs alright however when I submit this very same job…

java apache-spark google-cloud-dataproc

asked Mar 03 '16 at 19:01

MichaelDD

716
6
14

0

votes

1 answer

read file in spark jobs from google cloud platform

I'm using spark on google cloud platform. Apparently I'm reading a file from the filesystem gs:///dir/file, but the log output prompts FileNotFoundException: `gs:/bucket/dir/file (No such file or dir exist) The missing / is obviously the…

apache-spark google-cloud-storage google-cloud-platform google-cloud-dataproc

asked Mar 03 '16 at 02:07

Vincent Tan Min Sheng

25
6

0

votes

1 answer

Request had insufficient authentication scopes [403] when creating a cluster with Google Cloud Dataproc

In Google Cloud Platform the DataProc API is enabled. I am using the same key I use to access GCS and Big query to create a new cluster per this example. I get a Request had insufficient authentication scopes error on the following line. Operation…

c# google-bigquery google-cloud-platform google-cloud-dataproc

asked Mar 03 '16 at 00:22

PUG

3,867
11
69
106

0

votes

1 answer

Where does Google Dataproc store Spark logs on disk?

I'd like to get command line access to the live logs produced by my Spark app when I'm SSH'd into the master node (the machine hosting the Spark driver program). I'm able to see them using gcloud dataproc jobs wait, the Dataproc web UI, and in GCS,…

apache-spark google-cloud-dataproc

asked Mar 02 '16 at 17:33

Jon Chase

413
4
14

0

votes

1 answer

What is the best way to minimize the initialization time for Apache Spark jobs on Google Dataproc?

I am trying to use a REST service to trigger Spark jobs using Dataproc API client. However, each job inside the dataproc clusters take 10-15 s to initialize the Spark Driver and submit the application. I am wondering if there is an effective way to…

hadoop apache-spark google-cloud-dataproc

asked Mar 02 '16 at 15:46

pashupati

87
7

0

votes

0 answers

Dataproc failed to read parquet file in google cloud storage

I have a parquet file in google cloud storage, then try to read it as below: val parquetFile = sqlContext.read.parquet("gs://eng_sandbox1/shaw/testparquet/part-r-00000-b4aecbee-724e-40ea-b868-95f7e3f758a7.gz.parquet") However, I encountered the…

apache-spark-sql google-cloud-dataproc

asked Mar 02 '16 at 07:14

Shaw Ou

11

0

votes

1 answer

Google DataProc API spark cluster with c#

I have data in Big Query I want to run analytics on in a spark cluster. Per documentation if I instantiate a spark cluster it should come with a Big Query connector. I was looking for any sample code to do this, found one in pyspark. Could not find…

c# google-bigquery google-cloud-platform google-cloud-dataproc

asked Feb 25 '16 at 22:04

PUG

3,867
11
69
106

0

votes

1 answer

DataProc MapReduce stopped working

I run standard hbase class for counting rows (RowCounter) in a BigTable table. DataProc gui in Google Console is used. It worked fine, but after few weeks I tried to run similar jar and job fails due hardly explainable reason. This don't look like…

hadoop hbase bigtable google-cloud-dataproc

asked Feb 19 '16 at 10:16

Daneel Yaitskov

4,019
6
34
40

0

votes

1 answer

Google Cloud Dataproc - job file erroring on sc.textFile() command

Here is my file that I submit as a PySpark job in Dataproc, thru the UI # Load file data fro Google Cloud Storage to Dataproc cluster, creating an RDD # Because Spark transforms are 'lazy', we do a 'count()' action to make sure # we…

google-cloud-dataproc

asked Feb 11 '16 at 01:14

Thom Rogers

1,137
1
13
27

0

votes

1 answer

exec sh from PySpark

I'm trying to run a .sh file loading from a .py file in a PySpark's job but I receive a message always saying that .sh file is not found This is my code: test.py: import os,sys os.system("sh ./check.sh") and my gcloud command: gcloud beta dataproc…

apache-spark pyspark google-cloud-dataproc

asked Feb 03 '16 at 17:20

sergio

21
3

0

votes

1 answer

Accessing data in Google storage for Apache Spark SQL

I have about 30Gb worth of data in Cloud storage that I would like to query on using Apache Hive from a Dataproc cluster. What's the best strategy to access this data. Is the best approach to copy the data to my master via gsutil and access it from…

apache-spark apache-spark-sql google-cloud-dataproc

asked Jan 27 '16 at 09:34

femibyte

2,585
4
27
51

Questions tagged [google-cloud-dataproc]