Questions tagged [azure-hdinsight]

Questions about Azure HDInsight, is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the Microsoft Azure cloud.

Azure-HDInsight is a managed Apache Hadoop service that lets you run Apache Spark, Apache Hive, Apache Kafka, Apache HBase, and more in the cloud.

917 questions
72
votes
6 answers

Differences between Azure Block Blob and Page Blob?

As I recently started mingling around with Windows Azure, I've came up to a situation where, which one to go for between the Block Blob & Page Blob. I'm currently in progress of uploading some text, csv or dat files to a blob storage and then do a…
Kulasangar
  • 7,225
  • 3
  • 36
  • 71
16
votes
2 answers

What does %{ $_.Key1 } mean?

While programming for HDInsight I came across lines like $storageAccountKey = Get-AzureRmStorageAccountKey -ResourceGroupName $resourceGroupName -Name $storageAccountName | %{ $_.Key1 } I understand $_ refers to the result of the…
Frank im Wald
  • 846
  • 1
  • 10
  • 26
12
votes
5 answers

Azure Data lake VS Azure HDInsight

I was going through the Microsoft documents: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-overview I'm new to Azure Data lake and HDInsight. There is a statement in the URL which tells that "Azure Data Lake Store can be…
AskMe
  • 2,003
  • 4
  • 31
  • 65
12
votes
3 answers

Spark SQL: How to consume json data from a REST service as DataFrame

I need to read some JSON data from a web service thats providing REST interfaces to query the data from my SPARK SQL code for analysis. I am able to read a JSON stored in the blob store and use it. I was wondering what is the best way to read the…
Kiran
  • 2,754
  • 5
  • 26
  • 61
10
votes
2 answers

ConcurrentModificationException when using Spark collectionAccumulator

I'm trying to run a Spark-based application on an Azure HDInsight on-demand cluster, and am seeing lots of SparkExceptions (caused by ConcurrentModificationExceptions) being logged. The application runs without these errors when I start a local…
codebox
  • 18,210
  • 7
  • 54
  • 77
9
votes
1 answer

Create hive external table from partitioned parquet files in Azure HDInsights

I have data saved as parquet files in Azure blob storage. Data is partitioned by year, month, day and hour like: cont/data/year=2017/month=02/day=01/ I want to create external table in Hive using following create statement, which I wrote using this…
chhantyal
  • 10,570
  • 6
  • 45
  • 71
9
votes
2 answers

How to load CSVs with timestamps in custom format?

I have a timestamp field in a csv file that I load to a dataframe using spark csv library. The same piece of code works on my local machine with Spark 2.0 version but throws an error on Azure Hortonworks HDP 3.5 and 3.6. I have checked and Azure…
9
votes
3 answers

Create external table with select from other table

I am using HDInsight and need to delete my clusters when I am finished running queries. However, I need the data I gather to survive for another day. I am working on queries that would create calculated columns from table1 and insert them into…
Roger
  • 1,916
  • 3
  • 29
  • 62
9
votes
2 answers

In Hive, how can I add a column only if that column does not exist?

I would like to add a new column to a table, but only if that column does not already exist. This works if the column does not exist: ALTER TABLE MyTable ADD COLUMNS (mycolumn string); But when I execute it a second time, I get an error. Column…
MattD
  • 1,148
  • 2
  • 13
  • 25
8
votes
3 answers

How to efficiently store and query a billion rows of sensor data

Situation: I've started a new job and been assigned the task of figuring out what to do with their sensor data table. It has 1.3 billion rows of sensor data. The data is pretty simple: basically just a sensor ID, a date and the sensor value at that…
7
votes
2 answers

spark-shell error : No FileSystem for scheme: wasb

We have HDInsight cluster in Azure running, but it doesn't allow to spin up edge/gateway node at the time of cluster creation. So I was creating this edge/gateway node by installing echo 'deb…
roy
  • 4,437
  • 16
  • 58
  • 118
7
votes
1 answer

Is there a Spark SQL jdbc driver?

I'm looking for a client jdbc driver that supports Spark SQL. I have been using Jupyter so far to run SQL statements on Spark (running on HDInsight) and I'd like to be able to connect using JDBC so I can use third-party SQL clients (e.g. SQuirreL,…
aaronsteers
  • 1,143
  • 1
  • 9
  • 25
7
votes
2 answers

Azure Storm vs Azure Stream Analytics

Looking to do real time metric calculations on event streams, what is a good choice in Azure? Stream Analytics or Storm? I am comfortable with either SQL or Java, so wondering what are the other differences.
6
votes
1 answer

How to use Avro on HDInsight Spark/Jupyter?

I am trying to read in a avro file inside HDInsight Spark/Jupyter cluster but got u'Failed to find data source: com.databricks.spark.avro. Please find an Avro package at http://spark.apache.org/third-party-projects.html;' Traceback (most recent…
Jiew Meng
  • 74,635
  • 166
  • 442
  • 756
6
votes
4 answers

Submit a Spark job from C# and get results

As per title, I would like to request a calculation to a Spark cluster (local/HDInsight in Azure) and get the results back from a C# application. I acknowledged the existence of Livy which I understand is a REST API application sitting on top of…
Stefano d'Antonio
  • 5,164
  • 2
  • 25
  • 42
1
2 3
61 62