Questions tagged [hadoop3]

Use for questions specific to Apache Hadoop 3.0 features (i.e. Erasure Coding, YARN Timeline Service v2, Opportunistic Containers, 3+ NameNode fault-tolerance). For general questions related to Apache Hadoop use the tag [hadoop].

99 questions
0
votes
0 answers

Hadoop trying to use JDK install directory as executable command

I am new to Hadoop, and attempting to get the first simple 'Word count' example to run. I have the same problem that was reported here (but the responses there don't resolve the problem): Could not run jar file in hadoop3.1.3 Java is installed to…
0
votes
0 answers

Apache spark 2.4 compatibility with hadoop 3.2 environment

we are trying to run spark2.4.x on hadoop 3.2. but we have noticed that databricks only has prebuilt version for hadoop2.7/2.6 for spark2.4.x. here is the download page link from official databricks site: https://spark.apache.org/downloads.html so…
linehrr
  • 1,268
  • 13
  • 20
0
votes
0 answers

How to build hive from source against Hadoop version 3.x with a custom JDBC version that supports connecting hiveserver version 2.x

How to build hive from source against Hadoop version 3.x with a custom JDBC version that supports connecting hiveserver version 2.x I have looked at : https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ#HiveDeveloperFAQ-Building As…
Cod_enthu
  • 91
  • 1
  • 8
0
votes
1 answer

Hadoop3 balancer vs disk balancer

I read Hadoop ver 3 document about disk balancer and it said "Diskbalancer is a command line tool that distributes data evenly on all disks of a datanode. This tool is different from Balancer which takes care of cluster-wide data balancing." I…
eyeballs
  • 161
  • 1
  • 2
  • 11
0
votes
0 answers

Hadoop web interface is not working even though the nodes are starting

I'm trying to install Hadoop v3.1.3 in pseudo-distributed mode in my Ubuntu 18.04 environment. After following the documentation word-by-word, my web interface is still not working i.e. localhost:9870 yields no result. Log files are getting created…
Utsav
  • 9
  • 4
0
votes
0 answers

Hadoop Distcp CopyListing$DuplicateFileException issue

Running distcp job with distcpoptions below:- optionsDistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=true, useRdiff=false, fromSnapshot=snap1, toSnapshot=snap2,…
0
votes
1 answer

Hadoop 3.2.1 localhost: ERROR: You must be a privileged user in order to run a secure service

I am trying to install a simple hadoop setup on Ubuntu 20 on windows WSL. I am able to get NameNode and Yarn running but the Datanodes is failing Getting the following error while trying to start-dfs.sh hadoopuser@mycompu:~/hadoop$…
virtuvious
  • 1,954
  • 2
  • 15
  • 15
0
votes
1 answer

Not able to create hive table using sqoop

I am trying below command to import the mysql table stocks to my hive(v3.1.2) in Ubuntu 18.0.4 and Hadoop 3 using sqoop(v1.4.7) sqoop import --connect jdbc:mysql://localhost/myhadoop --username hiveuser --password xxx --table stocks --bindir…
suneesh
  • 11
  • 1
0
votes
0 answers

Hadoop YARN resource manager not able start due to error

I am trying to run Hadoop (HDFS and YARN) in multi-node cluster (2 nodes) but the resource manager fails to start on slave node. Basically, it fails due to the below exception - not able to find a class called javax.activation.DataSource (which is…
Learner
  • 413
  • 4
  • 17
0
votes
1 answer

The sqoop is not working on my ubuntu 18.04 with hadoop 3.1.3

I am getting below error in my Ubutnttu(18.0.4) machine while launching sqoop(1.4.7,Hadoop-3.1.3) command used: sqoop import --connect jdbc:mysql://localhost/myhadoop --username hiveuser --password xxxx --table employee --split-by --target-dir…
suneesh
  • 11
  • 1
0
votes
1 answer

Hadoop Client unable to connect to datanode

I have single node hadoop cluster on ec2. Tried to give all posible combinations in slaves file. May 01 2020 08:16:25.227 DEBUG org.apache.hadoop.hdfs.DFSClient - pipeline = 172.31.45.114:9866 May 01 2020 08:16:25.227 DEBUG…
0
votes
0 answers

Could not find or load main class hdfs problem

I am trying to use Apache Rya for some tests (https://rya.apache.org/). For those who are familiar with Rya and RDF stores, I am trying to do a bulk loading which is explained here:…
M.Taki_Eddine
  • 150
  • 1
  • 10
0
votes
0 answers

ERROR: datanode can only be executed by harry

I want to start all (namenode and datanode) but when I used this command start-all.sh it returned: ERROR: datanode can only be executed by harry How to fix this?
Key Jun
  • 311
  • 4
  • 15
0
votes
1 answer

Secondary Name Node in Hadoop

Suppose for checkpoint default time is 1hr. If Name Node goes down after 55min from last checkpoint. We loss the last 55 min data(edit log file data is not added in fsImage)?
0
votes
1 answer

Dask - trying to read hdfs data getting error ArrowIOError: HDFS file does not exist

I tried creating a dataframe from csv stored in hdfs. Connecting is successful. But when trying to get output of len function getting error. Code: from dask_yarn import YarnCluster from dask.distributed import Client, LocalCluster import…
Nirmal Ram
  • 1,155
  • 2
  • 8
  • 17