Questions tagged [qubole]

Qubole Data Service (QDS) is cloud Big Data service running on an elastic Hadoop-based cluster

Source Creators of Facebook’s Big Data infrastructure and Apache Hive have leveraged their experience to deliver Qubole Data Service (QDS) – a cloud Big Data service offering the same advanced capabilities used by Big Data savvy organizations.

Minimize operational interaction and provide your data analysts with an easy to use graphical interface, built-in connectors, and seamless, elastic cloud infrastructure.

Your Hadoop cluster is ready within minutes post signup, letting you focus on building sophisticated data pipelines, running queries, scheduling jobs and monetizing your big data.

An auto-scaling cluster, improved I/O optimization, faster queries and support for hybrid pricing - realize cost savings of as much as 50%-60% in total, while accomplishing tasks faster.

82 questions
0
votes
2 answers

Implement case class inside a class

I am using the below code to run in Qubole Notebook and the code is running successfully. case class cls_Sch(Id:String, Name:String) class myClass { implicit val sparkSession =…
Sarath KS
  • 15,816
  • 9
  • 67
  • 77
0
votes
1 answer

retrieve size of data copied with hadoop distcp

I am running a hadoop distcp command as below: hadoop distcp src-loc target-loc I want to know the size of the data copied by running this command. I am planning to run the command on Qubole. Any help is appreciated
sneha salvi
  • 57
  • 1
  • 10
0
votes
1 answer

Big files causing shuffle error in hadoop map reduce

I am seeing the following error when I try to process big file like size > 35GB files, but doesn't happen when I try less big file like size < 10GB . App > Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in…
Jal
  • 1,694
  • 13
  • 24
0
votes
1 answer

Get correct value from array in Hive QL

I have a Wrapped Array and want to only get the corresponding value struct when I query with LATERAL VIEW EXPLODE. SAMPLE STRUCTURE: COLUMNNAME: theARRAY WrappedArray([null,theVal,valTags,[123,null,null,null,null,null],false],…
noobeerp
  • 147
  • 1
  • 3
  • 9
0
votes
0 answers

Convert column in presto from epoch to date

I tried this but that didn't work. cast(from_unixtime('1532568232662880')) as date Any other ideas?
nak5120
  • 3,410
  • 3
  • 23
  • 62
0
votes
1 answer

Amazon s3Exception bad request and location constraint in hadoop s3a

Does location constraint require extra permission policy for hadoop s3a? I am seeing Exception in thread "main" com.qubole.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad…
Jal
  • 1,694
  • 13
  • 24
0
votes
1 answer

How do I get the value without the square brackets

I have created a dataframe using Scala and Spark SQL. I wanted the first value from the table but I am getting it inside of square brackets []. Can I just get the value without the brackets? Code: val sigh = sqlContext.sql("""SELECT DISTINCT…
0
votes
0 answers

select a table from a database in R

I am using dbplyr to select a table from a remote database using Rstudio. I connected with Spark in the server using livy. It shows me the databases I have but when I try to access one of the tables in one of the schemas, it…
Fisseha Berhane
  • 1,905
  • 1
  • 20
  • 41
0
votes
1 answer

Get Not Null Values in Wrapped Array

I have a Wrapped Array and want to only get the Non Null values when I query with LATERAL VIEW EXPLODE. I also tried IS NOT NULL but that does not return anything. SAMPLE STRUCTURE: COLUMNNAME:…
noobeerp
  • 147
  • 1
  • 3
  • 9
0
votes
1 answer

Set partition location in Qubole metastore using Spark

How to set partition location for my Hive table in Qubole metastore? I know that this is MySQL DB, but how to access to it and pass a SQL script with a fix using Spark? UPD: The issue is that ALTER TABLE table_name [PARTITION (partition_spec)] SET…
Vova Lis
  • 21
  • 2
0
votes
0 answers

Container Packing in YARN

Qubole has implemented Container Packing in YARN for cloud deployments to reduce infrastructure cost, is there any similar implementation available in open source world?
banjara
  • 3,561
  • 2
  • 33
  • 53
0
votes
1 answer

Qubole: How can I download scheduler result in python?

Like title, I managed myself download the Qubole result using the query id in python, however, is there a method that I can download the result using scheduler job ID instead of query ID? Thanks.
atsang01
  • 141
  • 1
  • 11
0
votes
1 answer

unable to connect ms sql server from Presto in Qubole

I am using Qubole Data Service on Microsoft Azure. I have created Presto Cluster in Qubole. I want to connect to MS SQL Server from Presto to read data from MS SQL Server. I have created sqlserver directory on…
Heta Desai
  • 49
  • 1
  • 10
0
votes
1 answer

Comparing one day worth of data from S3 buckets faster

Consider 2 data flows below 1. Front End Box ----> S3 Bucket-1 2. Front End Box ----> Kafka --> Storm ---> S3 Bucket-2 The logs from the boxes are being transferred to S3 buckets. The requirement is to replace flow 1 by flow 2. Now the data…
Albatross
  • 583
  • 4
  • 21
0
votes
1 answer

How to query data from gz file of Amazon S3 using Qubole Hive query?

I need get specific data from gz. how to write the sql? can I just sql as table database?: Select * from gz_File_Name where key = 'keyname' limit 10. but it always turn back with an error.
daxue
  • 229
  • 1
  • 2
  • 10