Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

82 questions

votes

1 answer

How to kill hadoop job gracefully/intercept `hadoop job -kill`

My Java application runs on mapper and creates child processes using Qubole API. Application stores child qubole queryIDs. I need to intercept kill signal and shutdown child processes before exit. hadoop job -kill jobId and yarn application -kill…

asked May 30 '17 at 19:16

leftjoin

28,302
6
46
84

votes

1 answer

Stratified Sampling in Hive

The following returns a 10% sample of the A and X columns stratified by the values of X. select A, X from( select A, count(*) over (partition by X) as cnt, rank() over (partition by X order by rand()) as rnk from my_table)…

sql hive qubole

asked Aug 12 '14 at 21:50

Amelio Vazquez-Reina

74,000
116
321
514

votes

1 answer

Divide Spark DataFrame data into separate files

I have the following DataFrame input from a s3 file and need to transform the data into the following desired output. I am using Spark version 1.5.1 with Scala, but could change to Spark with Python. Any suggestions are welcome. DataFrame…

scala apache-spark dataframe amazon-s3 qubole

asked Nov 11 '16 at 18:18

satoukum

votes

0 answers

Fetch all Column Statistics using Single Query Hive

I understand that all the column statistics can be computed for a Hive table using the command- ANALYZE TABLE Table1 COMPUTE STATISTICS; Then Specific column level stats can be fetched through the command - DESCRIBE FORMATTED…

hive bigdata qubole hive-query

asked Jul 10 '18 at 11:00

Abhi Nandan

votes

1 answer

Insert into ElasticSearch using Hive/Qubole

I am trying to insert data into elastic search from a hive table. CREATE EXTERNAL TABLE IF NOT EXISTS es_temp_table ( dt STRING, text STRING ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' …

elasticsearch hive qubole

asked Feb 18 '15 at 18:34

stogers

votes

0 answers

Query Qubole data in Python

I'm trying to query Qubole data in Python, but running into some issues. Below is my code: from qds_sdk.qubole import Qubole Qubole.configure(api_token="api_token", api_url="https://us.qubole.com/api") from qds_sdk.commands import…

python qubole

asked Apr 30 '21 at 20:53

BirdPlay6

votes

1 answer

How to create external tables from parquet files in s3 using hive 1.2?

I have created an external table in Qubole(Hive) which reads parquet(compressed: snappy) files from s3, but on performing a SELECT * table_name I am getting null values for all columns except the partitioned column. I tried using different…

hadoop hive hiveql qubole

asked May 15 '19 at 20:21

S.Mehra

votes

1 answer

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly. eg. 75GB took 1h 86GB took 5h I also see average shuffle time increase 10 fold eg. 75GB 4min 85GB 41min Can someone point me…

hadoop mapreduce qubole

asked Sep 21 '18 at 18:03

Jal

1,694
13
24

votes

2 answers

Fixing java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim

Consider the following error: 2018-07-12 22:46:36,087 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim(Ljava/lang/String;)Ljava/lang/String; at…

java mapreduce aws-java-sdk qubole

asked Jul 12 '18 at 23:04

Jal

1,694
13
24

votes

1 answer

java.io.FileNotFound exception while writing to apache spark in qubole

I have a code in apache spark 1.6.3 running on qubole which writes data to multiple tables(parquet format) on s3. At the time of writing to tables I keep getting java.io.FileNotFound exception. I am even setting:…

apache-spark amazon-s3 eventual-consistency qubole

asked Nov 23 '17 at 04:45

Raghwendra Singh

votes

0 answers

Kafka Connect Hive Integration issue

I am using kafka connect for hive integration to create hive tables along with partitions on s3. After starting connect distributed process and making a post call to listen to a topic, as soon as there is some data in the topic, I can see in the…

apache-kafka apache-kafka-connect confluent-platform qubole

asked Jul 16 '17 at 19:36

Ashish

votes

1 answer

Median value from table with number:count format

Given a table +------------+-----------+ | Number | Count | +------------+-----------+ | 0 | 7 | +------------+-----------+ | 1 | 1 | +------------+-----------+ | 2 | 3 …

mysql sql hive qubole

asked Oct 06 '15 at 06:27

Lenix

vote

1 answer

Insert overwrite doesn't delete all the old data files

We are trying to insert overwrite a hive table. Most of the times it's overwriting as expected, i.e deleting any old files and replace new files. We are seeing some inconsistencies with this behavior, once in a while all the old files are not…

hive insert hiveql overwrite qubole

asked May 18 '21 at 04:57

Jas

vote

1 answer

Retrieve value in an array of an array with struct

I have a column in Hive table with type: array>> Here is the sample of data in the column: [ [ { "type": "PROFIT", "value": "100", "currency": "USD" }, { …

sql arrays hive hiveql qubole

asked May 06 '21 at 04:15

user1761325

vote

1 answer

Exclude records with certain values in Qubole

Using Qubole I have Table A (columns in json parsed...) ID Recommendation Decision 1 GOOD GOOD 2 BAD BAD 2 GOOD BAD 3 GOOD BAD 4 BAD GOOD 4 GOOD BAD I…

sql hadoop hive hiveql qubole

asked Nov 24 '20 at 08:00

Kurlito

2 3 4 5 6 Next