10

I'm using Python 2.6.2. The docs for the filecmp module say:

The filecmp module defines functions to compare files and directories, with various optional time/correctness trade-offs.

and, of the filecmp.cmp function:

filecmp.cmp(f1, f2[, shallow])

Compare the files named f1 and f2, returning True if they seem equal, False otherwise.

Unless shallow is given and is false, files with identical os.stat() signatures are taken to be equal.

What they don't do is specify just what is the correctness level one obtains with shallow=False. So, what does shallow=False do? How correct is it?

vanden
  • 351
  • 2
  • 11

1 Answers1

13

Consulting the source filecmp.py reveals that if shallow=False, filecmp.cmp first checks a few select properties of os.stat(), regardless of whether shallow is True or False. If the stat properties that are examined are the same, it returns True. Else, it checks its internal cache to see if the files have already been compared earlier. If it has, it returns True. Else, it reads BUFSIZE = 8*1024 chunks of data from both files and does an exact contents comparison until it reaches the end of the file. It returns True if the two files have exactly the same contents.

Nav
  • 16,995
  • 26
  • 78
  • 120
vanden
  • 351
  • 2
  • 11
  • 2
    I have recently clarified this in the official documentation of Python 3.3 – Eli Bendersky Jul 26 '12 at 08:12
  • 3
    How and when will one need to do a 'shallow comparison' between two files? The only difference of this would be it also regard 'mode mtime size' equality as file equality, well, but this doesn't make much sense to me. – Robert Bean Aug 14 '13 at 07:54
  • @RobertBean It saves you a lot of processing time. I think they determined that most of the time files with identical signatures are in fact identical. – NullUserException Aug 15 '14 at 16:40
  • @NullUserException Since the os.stat includes inode number, then os.stat would only be the same if the files are hardlinks, correct? (Though this may vary with filesystem, and os.stat follows symlinks?) – endolith Feb 24 '21 at 12:24