1

we have HDP cluster , version 2.6.5

cluster include management of two name-node ( one is active and the secondary is standby )

and 65 datanode machines

we have problem with the standby name-node that not started and from the namenode logs we can see the following

2021-01-01 15:19:43,269 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log.  We expected txid 90247527115, but got txid 90247903412.
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:838)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:693)

from ambari we can see that standby is down

enter image description here

for now the active namenode is up but the standby name node is down , and the root cause for this issue is because namenode matadata is damaged/corrupted.

so we have two solution - A or B

A)

run the following recover on standby namenode

su
hadoop namenode -recover

B)

Put Active NN in safemode

su hdfs 
hdfs dfsadmin -safemode enter

Do a savenamespace operation on Active NN

su hdfs 
hdfs dfsadmin -saveNamespace

Leave Safemode

 su hdfs 
 hdfs dfsadmin -safemode leave

Login to Standby NN

Run below command on Standby namenode to get latest fsimage that we saved in above steps.

 su  hdfs 
 hdfs namenode -bootstrapStandby -force

what is the preferred solution for our problem?

OneCricketeer
  • 126,858
  • 14
  • 92
  • 185
jessica
  • 1,804
  • 4
  • 24

0 Answers0