Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

82 questions

vote

1 answer

AWS S3 access issue when using qubole/streamx on AWS EMR

I am using qubole/streamx as a kafka sink connector to consume data in kafka and store them in AWS S3. I created a user in AIM and permission is AmazonS3FullAccess. Then set key ID and key in hdfs-site.xml which dir is assign in…

asked Feb 14 '17 at 10:49

Chris Feng

vote

1 answer

pyspark job on qubole fails with "Retrying exception reading mapper output"

I have a pyspark job running via qubole which fails with the following error. Qubole > Shell Command failed, exit code unknown Qubole > 2016-12-03 17:36:53,097 ERROR shellcli.py:231 - run - Retrying exception reading mapper output: (22, 'The…

pyspark qubole

asked Dec 03 '16 at 17:50

Lekha Muraleedharan

vote

1 answer

How do I optimize my hive query for finding Sum of Count of Records from multiple tables

I’ve to generate a report that will give me the sum of the counts from tables A, B and C for events that have been stored using Hive and my S3 buckets have been partitioned by Organization_id For eg: Table A – Has a record for every day John (and…

hadoop amazon-s3 hiveql qubole

asked Mar 30 '16 at 15:48

Ajay

vote

1 answer

Unable to create table in Qubole similar to mysql

I want to create a external table in Qubole similar to a table created in Mysql. Query for create table in mysql is: CREATE TABLE `mytable` ( `id` varchar(50) NOT NULL, `v_count` int(11) DEFAULT NULL, `l_visited` timestamp NOT NULL DEFAULT…

mysql hive qubole

asked Dec 09 '15 at 05:26

Rahul Kumar

vote

2 answers

Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

In order to reduce the time for provisioning, we've decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we'll need to implement some sort of autoscaling. I'm not familiar at all…

hadoop amazon-web-services emr autoscaling qubole

asked Nov 05 '14 at 00:13

user1136342

3,901
8
25
38

votes

0 answers

Report output is coming wrong in Qubole while exporting

I'm facing a bizarre issue on Qubole, I'm generating the report on Qubole and using the Bash command with my AWS key and Secret Key to export the data. On my s3 server, I see the correct file name but the report which I'm expecting is not the one.…

amazon-s3 data-export qubole

asked Apr 16 '21 at 14:52

Paras Khushiramani

votes

0 answers

Migration of Qubole objects

How to migrate/move qubole objects (Notebooks,Schedules,Environments,Cluster configs) from https://api.qubole.com to https://us.qubole.com QDS Environment

hadoop mapreduce hadoop2 qubole

asked Mar 23 '21 at 10:58

muku

votes

1 answer

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-proof way to insert parameters into a query and get…

python sql sql-injection qubole

asked Jan 04 '21 at 16:51

Roméo Després

votes

0 answers

Kinesis Spark Qubole Cant get newer records

I am trying to get records from my stream with qubole kinesis spark library: val kinesis = sparkContextService.SQLC.sparkSession.readStream .format("kinesis") .option("streamName", "streamName") .option("region", "region") …

amazon-web-services apache-spark amazon-kinesis qubole

asked Nov 20 '20 at 14:14

Tom Hill

votes

2 answers

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other than Spark SQL DataFrames. What should I do to…

python azure qubole

asked Aug 03 '20 at 19:21

HT.

votes

1 answer

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help me how to change this parameter value to set to…

python qubole

asked Jun 17 '20 at 04:53

Trupti

votes

1 answer

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used in the library (configured to log on stdout) How…

apache-spark qubole

asked May 26 '20 at 06:50

bde.dev

votes

1 answer

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: https://github.com/qubole/spark-acid Below is our…

apache-spark spark-structured-streaming qubole spark-hive spark-checkpoint

asked May 22 '20 at 06:56

Shuwn Yuan Tee

4,890
5
23
35

votes

1 answer

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (should expire in 7 days) and sends it to…

amazon-web-services airflow qubole

asked May 12 '20 at 13:16

Maria Livia

votes

3 answers

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java client to do this (I am already using it for other…

hive qubole

asked Apr 22 '20 at 22:44

GreenGiant

4,226
1
40
69

Prev 1 2

4 5 6 Next