Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

82 questions
1
vote
1 answer

AWS S3 access issue when using qubole/streamx on AWS EMR

I am using qubole/streamx as a kafka sink connector to consume data in kafka and store them in AWS S3. I created a user in AIM and permission is AmazonS3FullAccess. Then set key ID and key in hdfs-site.xml which dir is assign in…
Chris Feng
  • 149
  • 2
  • 16
1
vote
1 answer

pyspark job on qubole fails with "Retrying exception reading mapper output"

I have a pyspark job running via qubole which fails with the following error. Qubole > Shell Command failed, exit code unknown Qubole > 2016-12-03 17:36:53,097 ERROR shellcli.py:231 - run - Retrying exception reading mapper output: (22, 'The…
1
vote
1 answer

How do I optimize my hive query for finding Sum of Count of Records from multiple tables

I’ve to generate a report that will give me the sum of the counts from tables A, B and C for events that have been stored using Hive and my S3 buckets have been partitioned by Organization_id For eg: Table A – Has a record for every day John (and…
Ajay
  • 11
  • 2
1
vote
1 answer

Unable to create table in Qubole similar to mysql

I want to create a external table in Qubole similar to a table created in Mysql. Query for create table in mysql is: CREATE TABLE `mytable` ( `id` varchar(50) NOT NULL, `v_count` int(11) DEFAULT NULL, `l_visited` timestamp NOT NULL DEFAULT…
Rahul Kumar
  • 141
  • 1
  • 5
1
vote
2 answers

Autoscaling EMR- is it required? Should I just use EC2? Should I just use Qubole?

In order to reduce the time for provisioning, we've decided to keep up a dedicated EMR cluster with 5 instances (we expect to need about 5). In case we need more, we think we'll need to implement some sort of autoscaling. I'm not familiar at all…
user1136342
  • 3,901
  • 8
  • 25
  • 38
0
votes
0 answers

Report output is coming wrong in Qubole while exporting

I'm facing a bizarre issue on Qubole, I'm generating the report on Qubole and using the Bash command with my AWS key and Secret Key to export the data. On my s3 server, I see the correct file name but the report which I'm expecting is not the one.…
0
votes
0 answers

Migration of Qubole objects

How to migrate/move qubole objects (Notebooks,Schedules,Environments,Cluster configs) from https://api.qubole.com to https://us.qubole.com QDS Environment
muku
  • 39
  • 1
  • 6
0
votes
1 answer

How to safely insert parameters into a SQL query and get the resulting query?

I have to use a non DBAPI-compliant library to interact with a database (qds_sdk for Qubole). This library only allows to send raw SQL queries without parameters. Thus I would like a SQL injection-proof way to insert parameters into a query and get…
Roméo Després
  • 762
  • 1
  • 8
  • 21
0
votes
0 answers

Kinesis Spark Qubole Cant get newer records

I am trying to get records from my stream with qubole kinesis spark library: val kinesis = sparkContextService.SQLC.sparkSession.readStream .format("kinesis") .option("streamName", "streamName") .option("region", "region") …
0
votes
2 answers

How to get Python in Qubole to save CSV and TXT files to Azure data lake?

I have Qubole connected to Azure data lake, and I can start a spark cluster, and run PySpark on it. However, I can't save any native Python output, like text files or CSVs. I can't save anything other than Spark SQL DataFrames. What should I do to…
HT.
  • 25
  • 4
0
votes
1 answer

How to change the timeout value when running commands on QDS

I've a spark-submit command that calls my python script. The code runs more than 36 hours, however because of the QDS timeout limit of 36 hours my command gets killed after 36 hours. Can someone help me how to change this parameter value to set to…
Trupti
  • 1
0
votes
1 answer

Logging and Debuging on Qubole

How does one log on Qubole/access logs from spark on Qubole? The setup I have: java library (JAR) Zeppelin Notebook (Scala), simply calling a method from the library Spark, Yarn cluster Log4j2 used in the library (configured to log on stdout) How…
bde.dev
  • 419
  • 5
  • 9
0
votes
1 answer

Spark Structured Streaming using spark-acid writeStream (with checkpoint) throwing org.apache.hadoop.fs.FileAlreadyExistsException

In our Spark app, we use Spark structured streaming. It uses Kafka as input stream, & HiveAcid as writeStream to Hive table. For HiveAcid, it is open source library called spark acid from qubole: https://github.com/qubole/spark-acid Below is our…
0
votes
1 answer

Avoid pre-signed URL expiry when IAM role key rotates

In Airflow I have 2 tasks defined that run every day: the first one creates a zip file and saves it in AWS under s3://{bucket-name}/foo/bar/{date}/archive.zip the second one pre-signs that url (should expire in 7 days) and sends it to…
Maria Livia
  • 65
  • 1
  • 7
0
votes
3 answers

How to query table partitions list using

I need to programmatically query Qubole for the list of partitions for a Hive table. I can do this by calling the correct API endpoint as described here, but I would like to use the qds-sdj-java client to do this (I am already using it for other…
GreenGiant
  • 4,226
  • 1
  • 40
  • 69