Highest Voted 'google-cloud-dataproc' Questions

0

votes

1 answer

Dataproc : Submit a Spark Job through REST API

We are using GoogleCloudPlatform for big-data analytics. For processing we are currently using the google cloud dataproc & spark-streaming. I want to submit a Spark job using the REST API, but when I am calling the URI with the api-key, I am getting…

google-cloud-platform google-cloud-dataproc

asked Jun 13 '16 at 08:34

Remis Haroon - رامز

2,536
1
28
53

0

votes

1 answer

Performance monitoring for Google Cloud DataProc

We are using GoogleCloudPlatform for big-data analytics. For processing we are currently using the google cloud dataproc & spark-streaming. We would like to check the job performance using some monitoring utilities like Ganglia, Graphite,…

google-cloud-platform google-cloud-dataproc

asked Jun 13 '16 at 04:22

Remis Haroon - رامز

2,536
1
28
53

0

votes

0 answers

Spark Tasks not evenly distributed among executors (google cloud dataproc)

I noticed that after a repartition the tasks do not always get evenly distributed among the executors. This causes an enormous buildup. The repartition function randomly assigns a partition number for each item. It seems that the tasks are quite…

apache-spark google-cloud-dataproc

asked Jun 10 '16 at 13:12

bjorndv

413
4
13

0

votes

1 answer

Dataproc Cluster with Spark 1.6.X using scala 2.11.X

I'm looking for a way to use Spark on Dataproc built with Scala 2.11. I want to use 2.11 since my jobs pulls in ~10 BigQuery tables and I'm using the new reflection libraries to map the corresponding objects to case classes. (There's a bug with the…

scala apache-spark google-cloud-dataproc

asked Jun 02 '16 at 22:07

J.Fratzke

1,067
12
19

0

votes

1 answer

How we can deploy my existing kafka - spark - cassandra project to kafka - dataproc -cassandra in google-cloud-platform?

My existing project is kafka-spark-cassandra. Now I have got gcp account and have to migrate spark jobs to dataproc. In my existing spark jobs parameters like masterip,memory,cores etc are passed through command line which is triggerd by a linux…

apache-kafka spark-cassandra-connector google-cloud-dataproc

asked Jun 01 '16 at 12:53

babz

31
2

0

votes

1 answer

Outputting billions of lines from Spark

I'm trying to output an RDD that has ~5,000,000 rows as a text file using PySpark. It's taking a long time, so what are some tips on how to make the .saveAsTextFile() faster? The rows are 3 columns each, and I'm saving to HDFS.

hadoop apache-spark google-cloud-storage pyspark google-cloud-dataproc

asked May 12 '16 at 21:40

cshin9

1,150
4
15
30

0

votes

1 answer

pyspark failed in google dataproc

My job failed with the following logs, however, I don't fully understand. It seems to be caused by "YarnSchedulerBackend$YarnSchedulerEndpoint: Container killed by YARN for exceeding memory limits. 24.7 GB of 24 GB physical ". But how can I…

apache-spark pyspark google-cloud-dataproc

asked May 05 '16 at 02:17

Hang

1
1
1

0

votes

1 answer

Google Cloud Spark ElasticSearch TransportClient connection exception

I am using Spark on Google Cloud and I have the following code to connect to an Elasticsearch database import org.elasticsearch.action.search.SearchResponse; import org.elasticsearch.client.transport.TransportClient; import…

elasticsearch apache-spark google-cloud-dataproc

asked May 01 '16 at 22:45

orestis

712
7
20

0

votes

1 answer

Dataproc bdutil versioning

Is possible to set hadoop cluster image version using the bdutil command-line tool? Using the WebUI console or GCloud is possible to chose image version 1.0 which supports Hadoop 2.x and Hive 1.2. In contrast, using bdutil, according to the…

hadoop google-cloud-dataproc

asked May 01 '16 at 09:35

Matěj Krejčí

1

0

votes

1 answer

Google Dataproc and BigQuery integration with custom query

I am running spark cluster using Google dataproc. I would like to get data from big-query using custom query. I am able to run the basic word count example but i am looking for a way to run custom query e.g. SELECT ROW_NUMBER() OVER() as Id, prop11…

google-bigquery google-cloud-dataproc

asked Apr 20 '16 at 16:13

gana

165
6

0

votes

1 answer

Google Dataflow vs Apache Spark Streaming (either on Google Cloud or with Google Dataproc)

I am new to Cloud and Big-data however have much of interest in these and I have significant experience in Java programming. I am currently working on my uni project for comparing performance of Apache Spark streaming with Google Cloud Dataflow. I…

apache-spark google-cloud-dataflow google-cloud-dataproc

asked Apr 19 '16 at 14:58

Priyanka

81
8

0

votes

2 answers

Apache Mahout on Dataproc?

Is Apache Mahout (https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html) available on Google Dataproc?

google-cloud-platform mahout mahout-recommender google-cloud-dataproc

asked Apr 18 '16 at 14:59

cshin9

1,150
4
15
30

0

votes

2 answers

How to access Cloud SQL from dataproc?

I have a dataproc cluster and I'd like to have the cluster access a Cloud SQL instance. When I created the cluster I assigned scope --scopes sql-admin but after reading the Cloud SQL documentation it looks like I need to connect through a proxy. How…

google-cloud-sql google-cloud-dataproc

asked Apr 09 '16 at 02:49

sthomps

3,458
6
29
50

0

votes

1 answer

why does google dataproc does not pull coreNLP jars although they are included in POM file?

My application is a java maven project that uses Spark. Here's the section in my pom that adds stanford coreNLP dependencies: edu.stanford.nlp stanford-corenlp …

java maven stanford-nlp google-cloud-platform google-cloud-dataproc

asked Mar 31 '16 at 15:03

Kai

1,275
4
14
28

0

votes

1 answer

Google Cloud Sdk from DataProc Cluster

What is the right way to use/install python google cloud apis such as pub-sub from a google-dataproc cluster? For example if im using zeppelin/pyspark on the cluster and i want to use the pub-sub api, how should i prepare it? It is unclear to me…

gcloud google-cloud-dataproc google-cloud-pubsub

asked Mar 26 '16 at 12:02

ismisesisko

141
2
10

Questions tagged [google-cloud-dataproc]