Is Namenode still necessary if I use S3 instead of HDFS?

Question

Recently I am setting up my Hadoop cluster over Object Store with S3, all data file are store in S3 instead of HDFS, and I successfully run spark and MP over S3, so I wonder if my namenode is still necessary, if so, what does my namenode do while I am running hadoop application over S3? Thanks.

score 4 · Accepted Answer · answered Nov 06 '17 at 12:12

No, provided you have a means to deal with the fact that S3 lacks the consistency needed by the shipping work committers. Every so often, if S3's listings are inconsistent enough, your results will be invalid and you won't even notice.

Different suppliers of Spark on AWS solve this in their own way. If you are using ASF spark, there is nothing bundled which can do this.

https://www.youtube.com/watch?v=BgHrff5yAQo

Is Namenode still necessary if I use S3 instead of HDFS?

1 Answers1