Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

82 questions
5
votes
1 answer

How to kill hadoop job gracefully/intercept `hadoop job -kill`

My Java application runs on mapper and creates child processes using Qubole API. Application stores child qubole queryIDs. I need to intercept kill signal and shutdown child processes before exit. hadoop job -kill jobId and yarn application -kill…
leftjoin
  • 28,302
  • 6
  • 46
  • 84
5
votes
1 answer

Stratified Sampling in Hive

The following returns a 10% sample of the A and X columns stratified by the values of X. select A, X from( select A, count(*) over (partition by X) as cnt, rank() over (partition by X order by rand()) as rnk from my_table)…
Amelio Vazquez-Reina
  • 74,000
  • 116
  • 321
  • 514
4
votes
1 answer

Divide Spark DataFrame data into separate files

I have the following DataFrame input from a s3 file and need to transform the data into the following desired output. I am using Spark version 1.5.1 with Scala, but could change to Spark with Python. Any suggestions are welcome. DataFrame…
satoukum
  • 848
  • 1
  • 18
  • 24
3
votes
0 answers

Fetch all Column Statistics using Single Query Hive

I understand that all the column statistics can be computed for a Hive table using the command- ANALYZE TABLE Table1 COMPUTE STATISTICS; Then Specific column level stats can be fetched through the command - DESCRIBE FORMATTED…
Abhi Nandan
  • 125
  • 3
  • 8
3
votes
1 answer

Insert into ElasticSearch using Hive/Qubole

I am trying to insert data into elastic search from a hive table. CREATE EXTERNAL TABLE IF NOT EXISTS es_temp_table ( dt STRING, text STRING ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' …
stogers
  • 249
  • 2
  • 12
2
votes
0 answers

Query Qubole data in Python

I'm trying to query Qubole data in Python, but running into some issues. Below is my code: from qds_sdk.qubole import Qubole Qubole.configure(api_token="api_token", api_url="https://us.qubole.com/api") from qds_sdk.commands import…
BirdPlay6
  • 53
  • 5
2
votes
1 answer

How to create external tables from parquet files in s3 using hive 1.2?

I have created an external table in Qubole(Hive) which reads parquet(compressed: snappy) files from s3, but on performing a SELECT * table_name I am getting null values for all columns except the partitioned column. I tried using different…
S.Mehra
  • 56
  • 1
  • 6
2
votes
1 answer

Debug failed shuffles in hadoop map reduces

I am seeing as the size of the input file increase failed shuffles increases and job complete time increases non linearly. eg. 75GB took 1h 86GB took 5h I also see average shuffle time increase 10 fold eg. 75GB 4min 85GB 41min Can someone point me…
Jal
  • 1,694
  • 13
  • 24
2
votes
2 answers

Fixing java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim

Consider the following error: 2018-07-12 22:46:36,087 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim(Ljava/lang/String;)Ljava/lang/String; at…
Jal
  • 1,694
  • 13
  • 24
2
votes
1 answer

java.io.FileNotFound exception while writing to apache spark in qubole

I have a code in apache spark 1.6.3 running on qubole which writes data to multiple tables(parquet format) on s3. At the time of writing to tables I keep getting java.io.FileNotFound exception. I am even setting:…
2
votes
0 answers

Kafka Connect Hive Integration issue

I am using kafka connect for hive integration to create hive tables along with partitions on s3. After starting connect distributed process and making a post call to listen to a topic, as soon as there is some data in the topic, I can see in the…
2
votes
1 answer

Median value from table with number:count format

Given a table +------------+-----------+ | Number | Count | +------------+-----------+ | 0 | 7 | +------------+-----------+ | 1 | 1 | +------------+-----------+ | 2 | 3 …
Lenix
  • 23
  • 3
1
vote
1 answer

Insert overwrite doesn't delete all the old data files

We are trying to insert overwrite a hive table. Most of the times it's overwriting as expected, i.e deleting any old files and replace new files. We are seeing some inconsistencies with this behavior, once in a while all the old files are not…
Jas
  • 11
  • 2
1
vote
1 answer

Retrieve value in an array of an array with struct

I have a column in Hive table with type: array>> Here is the sample of data in the column: [ [ { "type": "PROFIT", "value": "100", "currency": "USD" }, { …
user1761325
  • 81
  • 1
  • 7
1
vote
1 answer

Exclude records with certain values in Qubole

Using Qubole I have Table A (columns in json parsed...) ID Recommendation Decision 1 GOOD GOOD 2 BAD BAD 2 GOOD BAD 3 GOOD BAD 4 BAD GOOD 4 GOOD BAD I…
Kurlito
  • 13
  • 3
1
2 3 4 5 6