Questions tagged [hortonworks-data-platform]

Hortonworks Data Platform (HDP) is a distribution of an open-source Apache Hadoop data platform containing a set of projects on top of the HDFS and YARN as core layer of Hadoop.

The Hortonworks Data Platform (HDP) is an open-source Apache Hadoop data platform.

Resources

1245 questions
49
votes
13 answers

Spark read file from S3 using sc.textFile ("s3n://...)

Trying to read a file located in S3 using spark-shell: scala> val myRdd = sc.textFile("s3n://myBucket/myFile1.log") lyrics: org.apache.spark.rdd.RDD[String] = s3n://myBucket/myFile1.log MappedRDD[55] at textFile at :12 scala>…
Polymerase
  • 5,067
  • 6
  • 35
  • 54
30
votes
5 answers

Find port number where HDFS is listening

I want to access hdfs with fully qualified names such as : hadoop fs -ls hdfs://machine-name:8020/user I could also simply access hdfs with hadoop fs -ls /user However, I am writing test cases that should work on different distributions(HDP,…
ernesto
  • 1,739
  • 3
  • 24
  • 33
26
votes
6 answers

How to delete files from the HDFS?

I just downloaded Hortonworks sandbox VM, inside it there are Hadoop with the version 2.7.1. I adding some files by using the hadoop fs -put /hw1/* /hw1 ...command. After it I am deleting the added files, by the hadoop fs -rm /hw1/* ...command,…
serg
  • 883
  • 3
  • 12
  • 21
22
votes
2 answers

sqlContext HiveDriver error on SQLException: Method not supported

I have been trying to use sqlContext.read.format("jdbc").options(driver="org.apache.hive.jdbc.HiveDriver") to get Hive table into Spark without any success. I have done research and read below: How to connect to remote hive server from spark Spark…
HP.
  • 17,550
  • 43
  • 139
  • 240
18
votes
2 answers

Got InterruptedException while executing word count mapreduce job

I have installed Cloudera VM version 5.8 on my machine. When I execute word count mapreduce job, it throws below exception. `16/09/06 06:55:49 WARN hdfs.DFSClient: Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native…
16
votes
5 answers

How to disable Transparent Huge Pages (THP) in Ubuntu 16.04LTS

I am setting up an ambari cluster with 3 virtualbox VMs running Ubuntu 16.04LTS. However I get the below warning: The following hosts have Transparent Huge Pages (THP) enabled. THP should be disabled to avoid potential Hadoop performance…
thanuja
  • 516
  • 1
  • 6
  • 18
16
votes
1 answer

Connection reset by peer while running Apache Spark Job

We have two HDP cluster's setup let's call them A and B. CLUSTER A NODES : It contains a total of 20 commodity machines. There are 20 data nodes. As namenode HA is configured, there is one active and one standby namenode. CLUSTER B NODES : It…
Aniketh Jain
  • 543
  • 5
  • 21
16
votes
1 answer

Spark on YARN resource manager: Relation between YARN Containers and Spark Executors

I'm new to Spark on YARN and don't understand the relation between the YARN Containers and the Spark Executors. I tried out the following configuration, based on the results of the yarn-utils.py script, that can be used to find optimal cluster…
12
votes
2 answers

How do I get independent service Zeppelin to see Hive?

I am using HDP-2.6.0.3 but I need Zeppelin 0.8, so I have installed it as an independent service. When I run: %sql show tables I get nothing back and I get 'table not found' when I run Spark2 SQL commands. Tables can be seen in the 0.7 Zeppelin…
schoon
  • 1,878
  • 3
  • 26
  • 56
12
votes
3 answers

Install error: ftheader.h: No such file or directory

When I am trying to build matplotlib-1.3.1, I am getting the below freetype header errors. Probably it is not finding the ftheader.h. Any idea on how to solve this problem? NOTE: I just installed Freetype-2.5.0.1 following the instructions as…
somnathchakrabarti
  • 2,706
  • 9
  • 58
  • 87
11
votes
1 answer

Ext JS library not installed correctly in Oozie

I'm getting the following message when I access to the oozie UI. Oozie web console is disabled. To enable Oozie web console install the Ext JS library. I'm using HDP distribution and installed through ambari service installer. I tried to follow…
JaviOverflow
  • 1,142
  • 1
  • 9
  • 20
11
votes
4 answers

How to find Hadoop hdfs directory on my system?

How to find Hadoop HDFS directory on my system? I need this to run following command - hadoop dfs -copyFromLocal In this command I don't knon my hdfs-dir. Not sure if its helpful or not but I ran following command and got…
N..
  • 1,406
  • 3
  • 21
  • 37
10
votes
2 answers

ERROR 1066: Unable to open iterator for alias in Pig, Generic solution

A very common, error message in Apache Pig is: ERROR 1066: Unable to open iterator for alias There are several questions where this error is mentioned, but none of them give a generic approach for dealing with it. Hence this question: What to do…
Dennis Jaheruddin
  • 19,745
  • 7
  • 58
  • 100
10
votes
1 answer

Send KafkaProducer from local machine to hortonworks sandbox on virtualbox

I have a really simple producer that I am running through eclipse on my windows local machine... What I really want is to get a message through to kafka, so I will be able to view the broker through zookeeper. Just to see how communication works…
Mez
  • 4,365
  • 4
  • 24
  • 52
10
votes
6 answers

Hive: Sum over a specified group (HiveQL)

I have a table: key product_code cost 1 UK 20 1 US 10 1 EU 5 2 UK 3 2 EU 6 I would like to find the sum of all products for each group of "key" and…
joshlk
  • 1,190
  • 3
  • 17
  • 28
1
2 3
82 83