2

I am looking for a robust solution to define a unique identifier for measurement data files. I collect the data from different sources, mainly from network storage. The data files might be renamed and copied more than once to different locations. The method only needs to run on Windows platform. So far I do the following: create an ID from the last modification time and the size of the file. I assume that the file will only once be created during the measurement process and never be modified afterwards. This is my current implementation:

import pathlib
import datetime

def file_uid(file):

    fname = pathlib.Path(file)
    mod_time = datetime.datetime.fromtimestamp(fname.stat().st_mtime).strftime("%d.%m.%Y %H:%M:%S")
    file_size = fname.stat().st_size
    uid = '%s%s%s' %(mod_time,'_',str(file_size))
    return uid

Can this idea work, or did I miss something in general? What will be the best practice to accomplish a robust solution for this issue? Or should I go with some checksum algorithm and what would be recommended?

dspencer
  • 3,649
  • 4
  • 15
  • 35
Lama
  • 63
  • 7
  • 2
    The files might be renamed and copied, and you still want to identify them as the same file? A hash of the file would be a good way to test this. See [Hashing a file in Python](https://stackoverflow.com/questions/22058048/hashing-a-file-in-python) – dspencer Apr 03 '20 at 08:02
  • It sounds like you are about to reinvent the wheel. That wheel is called git – mvp Apr 03 '20 at 08:08
  • @mvp For "measurement data files"? If these are large, git is a poor choice, right? – dspencer Apr 03 '20 at 08:09
  • @mvp Your are absolute right the issue is the mess of data. I unfortunately I got no influence on that. Using Git as also mentioned form "dspencer" is due to the size of the binary data not a good choice. We use on other things the ASAM ODS-Server concept. But unfortunately not here. – Lama Apr 03 '20 at 08:37

1 Answers1

0

I would recommend assigning each file a short UDID. you can use something such as shortuuid:

pip install shortuuid

and then just

shortuuid.ShortUUID().random(length=22)
Bogdan Veliscu
  • 436
  • 5
  • 9
  • Thanks for the idea. Unfortunatly I not pointed out. That the unique id need also be realated to the data within the file. To avoid reading different files according to there file name and maybe storage position which just got the same data. Due to the fact that someone renamed or relocated the files. – Lama Apr 03 '20 at 08:15