I'm trying to write a Python script that will get the md5sum of all files in a directory (in Linux). Which I believe I have done in the code below.
I want to be able to run this to make sure no files within the directory have changed, and no files have been added for deleted.
The problem is if I make a change to a file in the directory but then change it back. I get a different result from running the function below. (Even though I changed the modified file back.
Can anyone explain this. And let me know if you can think of a work-around?
def get_dir_md5(dir_path):
"""Build a tar file of the directory and return its md5 sum"""
temp_tar_path = 'tests.tar'
t = tarfile.TarFile(temp_tar_path,mode='w')
t.add(dir_path)
t.close()
m = hashlib.md5()
m.update(open(temp_tar_path,'rb').read())
ret_str = m.hexdigest()
#delete tar file
os.remove(temp_tar_path)
return ret_str
Edit: As these fine folks have answered, it looks like tar includes header information like date modified. Would using zip work any differently or another format?
Any other ideas for work arounds?