I am looking for a robust solution to define a unique identifier for measurement data files. I collect the data from different sources, mainly from network storage. The data files might be renamed and copied more than once to different locations. The method only needs to run on Windows platform. So far I do the following: create an ID from the last modification time and the size of the file. I assume that the file will only once be created during the measurement process and never be modified afterwards. This is my current implementation:
import pathlib
import datetime
def file_uid(file):
fname = pathlib.Path(file)
mod_time = datetime.datetime.fromtimestamp(fname.stat().st_mtime).strftime("%d.%m.%Y %H:%M:%S")
file_size = fname.stat().st_size
uid = '%s%s%s' %(mod_time,'_',str(file_size))
return uid
Can this idea work, or did I miss something in general? What will be the best practice to accomplish a robust solution for this issue? Or should I go with some checksum algorithm and what would be recommended?