2

The documentation almost exactly the same for both, so I find it really hard to see the difference between them. Based on this Q&A, the sole difference is

[...] that checkpoint node can upload the new copy of fsimage file back to namenode after checkpoint creation where as a secondary namenode can’t upload [...]

It doesn't seem correct, because the Secondary NameNode can upload the new FsImage, based on this and this.

Can the Checkpoint Node be considered as a one-to-one replacement of the Secondary NameNode? What are the added benefits? Was the code cleaned up between the two or something like that?

kosii
  • 6,177
  • 1
  • 24
  • 37

3 Answers3

1

In the cloudera post you mentioned, the checkpointing process is clearly mentioned in both HighAvailability and NonHighAvailability scenarios. Secondary name node performs the task of checkpointing namenode in Non-HA scenario. In other scenario of HA, we can use standby namenode for checkpointing. In summary, checkpointing is more of a concept and depending of the scenario(HA/nonHA), different nodes perform that operation.

You can read that blog again. And let me know if any corrections are needed. Happy Learning

Ramzy
  • 6,200
  • 5
  • 16
  • 28
0

The difference between SNN and Check point NN is that SNN stores the data locally in file system but it does not upload the merged fsimage (with edit logs) to Active NN which Checkpoint NN does.

Niks
  • 11
  • 1
0

Even Apache documentation page does not cover differences properly. From the documentation page, it seems that role of both Secondary Name node & Checkpoint Node seems to be similar.

On a different note, I have raised a bug to correct the documentation page to avoid confusion : https://issues.apache.org/jira/browse/HDFS-8913

I hope this bug will be resolved soon.

Regarding your second query:

Can the Checkpoint Node be considered as a one-to-one replacement of the Secondary NameNode? What are the added benefits? Was the code cleaned up between the two or something like that?

Lot of things have changed with Hadoop 2.x release and Name node is not a single point of failure.

High Availability of Active name node with help of Stand by Name node is key feature in Hadoop 2.x.

You just need Active Name node and Stand by Name Node to achieve high availability.

Hadoop 2.x High Availability has been explained clearly in other SE questions:

Hadoop namenode : Single point of failure

How does Hadoop Namenode failover process works?

Community
  • 1
  • 1
Ravindra babu
  • 42,401
  • 8
  • 208
  • 194