We have a 150 Gb data folder. Within that, file content is any format (doc, jpg, png, txt, etc). We need to check all file content against each other to check if there are is duplicate file content. If so, then print the file path name list. For that, first I used ArrayList<File>
to store all files, then used FileUtils.contentEquals(file1, file2)
method. When I try it for a small amount of files(Folder) it's working but for this 150Gb data folder, it's not showing any result. I think first storing all files in an ArrayList makes the problem. JVM Heap problem, I am not sure.
Anyone have better advice and sample code to handle this amount of data? Please help me.