Highest Voted 'google-hadoop' Questions

0

votes

2 answers

Hive INSERT OVERWRITE to Google Storage as LOCAL DIRECTORY not working

I use the following Hive Query: hive> INSERT OVERWRITE LOCAL DIRECTORY "gs:// Google/Storage/Directory/Path/Name" row format delimited fields terminated by ',' select * from .; I am getting the following…

asked Sep 25 '15 at 09:16

Sujoy

1
1

0

votes

3 answers

Job tracking URL in Google Compute engine not working

I am using Google Compute Engine to run Mapreduce jobs on Hadoop (pretty much all default configs). While running the job I get a tracking URL of the form http://PROJECT_NAME:8088/proxy/application_X_Y/ but it fails to open. Did I forget to…

hadoop mapreduce google-compute-engine google-hadoop

asked Jul 28 '15 at 16:41

Grigory Yaroslavtsev

149
9

0

votes

1 answer

Spark 1.4 image for Google Cloud?

With bdutil, the latest version of tarball I can find is on spark 1.3.1: gs://spark-dist/spark-1.3.1-bin-hadoop2.6.tgz There are a few new DataFrame features in Spark 1.4 that I want to use. Any chance the Spark 1.4 image be available for bdutil, or…

apache-spark google-hadoop apache-spark-1.4

asked Jul 16 '15 at 23:27

Haiying Wang

622
6
10

0

votes

1 answer

How can I use GCP free credit to deploy Hadoop?

How can I use the Google Cloud Platform free trial to test a Hadoop cluster? What are the most important things I should keep in mind if I try this? Will I be charged during the free Google Cloud Platform trial?

google-cloud-platform google-hadoop

asked Jul 13 '15 at 23:44

James

2,181
11
26

0

votes

1 answer

Deleted google storage directory appears "already exists" when calling Spark DataFrame.saveAsParquetFile()

After I deleted a Google Cloud Storage directory through the Google Cloud Console, (the directory was generated by early Spark (ver 1.3.1) job), when re-run the job, it always fail and seemed the directory was still there to the job; I cannot find…

google-cloud-storage google-hadoop

asked Jul 10 '15 at 17:30

Haiying Wang

622
6
10

0

votes

1 answer

How to create a directory in HDFS on Google Cloud Platform via Java API

I am running an Hadoop Cluster on Google Cloud Platform, using Google Cloud Storage as backend for persistent data. I am able to ssh to the master node from a remote machine and run hadoop fs commands. Anyway when I try to execute the following code…

api hadoop google-hadoop

asked Jun 30 '15 at 21:13

gl051

561
4
8

0

votes

1 answer

Spark/Hadoop/Yarn cluster communication requires external ip?

I deployed Spark (1.3.1) with yarn-client on Hadoop (2.6) cluster using bdutil, by default, the instances are created with Ephemeral external ips, and so far spark works fine. With some security concerns, and assuming the cluster is internal…

hadoop apache-spark yarn google-hadoop

asked Jun 26 '15 at 16:36

Haiying Wang

622
6
10

0

votes

1 answer

Map tasks with input from Cloud Storage use only one worker

I am trying to use a file from Google Cloud Storage via FileInputFormat as input for a MapReduce job. The file is in Avro format. As a simple test, I deployed a small Hadoop2 cluster with the bdutil tool, consisting of the master and two worker…

java google-cloud-storage google-cloud-platform google-hadoop

asked Jun 12 '15 at 17:30

codemoped

225
2
10

0

votes

1 answer

Multiple Hadoop clusters in one Google Cloud project

Is it possible to deploy several Hadoop clusters in one Google Cloud project?

google-cloud-platform google-hadoop

asked Jun 05 '15 at 10:04

Evgeny Timoshenko

2,561
4
29
49

0

votes

2 answers

Map Only MapReduce Job with BigQuery

We have a Mapreduce job created to inject data into BigQuery. There is not much of filtering function in our job so we'd like to make it map-only job to make it faster and more efficient. However, the java class "com.google.gson.JsonObject"…

java hadoop google-bigquery google-hadoop

asked May 27 '15 at 13:56

user4944864

11

0

votes

1 answer

bdutil: How to launch a Hadoop cluster with a requested image id? (Ubuntu 12.04)

When I attempt to launch a Hadoop cluster with the bdutil command, using one of the following: bdutil -b a_hadoop_test -n 1 -P mycluster -e hadoop2_env.sh -i ubuntu-1204 deploy OR bdutil -b a_hadoop_test -n 1 -P mycluster -e hadoop2_env.sh -i…

google-compute-engine google-cloud-platform google-hadoop

asked May 18 '15 at 23:25

Michael Manoochehri

7,333
4
31
45

0

votes

1 answer

How to force bdutil command to run as root?

I am starting a Google Compute Engine VM from an App Engine application. The start-up scripts for the GCE VM run python scripts which, in turn, make os.system calls to bdutil commands, e.g., os.system("bdutil --bucket --num_workers 1 " …

python google-app-engine google-compute-engine google-hadoop

asked Apr 24 '15 at 20:29

Rich

132
9

0

votes

1 answer

Spark SQL on Google Compute Engine issue

We are using bdutil 1.1 to deploy a Spark (1.2.0) cluster. However, we are having an issue when we launch our spark script: py4j.protocol.Py4JJavaError: An error occurred while calling o70.registerTempTable. : java.lang.RuntimeException:…

hadoop apache-spark google-compute-engine google-cloud-platform google-hadoop

asked Mar 13 '15 at 13:11

poiuytrez

18,348
28
100
156

0

votes

1 answer

Error when running Spark on a google cloud instance

I'm running a standalone application using Apache Spark and when I load all my data to a RDD as a textfile I got the following error: 15/02/27 20:34:40 ERROR Utils: Uncaught exception in thread stdout writer for python java.lang.OutOfMemoryError:…

out-of-memory apache-spark rdd google-hadoop

asked Feb 27 '15 at 20:45

Saulo Ricci

698
1
8
23

0

votes

1 answer

JobTracker - High memory and native thread usage

We are running hadoop on GCE with HDFS default file system, and data input/output from/to GCS. Hadoop version: 1.2.1 Connector version: com.google.cloud.bigdataoss:gcs-connector:1.3.0-hadoop1 Observed behavior: JT will accumulate threads in waiting…

google-hadoop

asked Feb 10 '15 at 17:20

ichekrygin

3
2

Questions tagged [google-hadoop]