I need to synchronize files from directory A to directory B. I check for files in A and then compare them with files in B one by one. If a file of same name as A is found in B, I check to see if files are different by comparing their size. If the file sizes are different, I log this and move on to next file. However if the file sizes are same, I need to verify the contents of the files are different as well. For this, I thought of creating hashes of both files and compare them. Is this better or should I compare the files byte by byte? Please also tell why would you choose either one of the methods.
I am using C# (.NET 4) and need to preserve all files on B while replicating newly added files on A and reporting (and skipping) any duplicates.
Thanks.
EDIT: This job will run nightly and I have the option of storing hashes of files on directory B only, directory A will be populated dynamically so I can not pre-hash those files. Also which hash algorithms are better for this purpose as I want to avoid hash collisions as well.