0

Given a directory containing say a few thousand files, please output a list of all the names of the files in the directory that are exactly the same,

I can write script to read files from dir, but i need to compare contents of each and every file in the directory and output should be like matches {f1,f2} {f4,f6}

#!/usr/bin/env python

import sys
import glob
import errno

path = '/home/harish/myfiles'   
files = glob.glob(path)   
for name in files:  # 'file' is a builtin type, 'name' is a less-ambiguous variable name.
    try:
        with open(name) as f: # No need to specify 'r': this is the default.
            sys.stdout.write(f.read())
    except IOError as exc:
        if exc.errno != errno.EISDIR: # Do not fail if a directory is found, just ignore it.
            raise # Propagate other kinds of IOError.
Tyson
  • 51
  • 4
  • 1
    What have you tied? – Nipun Garg Nov 03 '17 at 09:19
  • You may find [this thread](https://stackoverflow.com/questions/254350/in-python-is-there-a-concise-way-of-comparing-whether-the-contents-of-two-text) interesting. – Marco Milanesio Nov 03 '17 at 09:22
  • "please output a list of all the names of the files in the directory that are exactly the same" can you reword it please? Its a little tricky to understand whether you want to perform a comparison of the contents of all files, a comparison of all the filenames, or a comparison of the cases that satisfy both criteria – Zulfiqaar Nov 03 '17 at 09:40
  • It Should compare the contents of all the files and output the list of files that are exactly same.. like – Tyson Nov 03 '17 at 09:44

1 Answers1

0

You can find a md5 sum for each file

md5sum = hashlib.md5(open(full_path, 'rb').read()).hexdigest()

Put them in dictionary like this: {'md5' : ["file_name1, file_name2"]}

then

for key, value in md5s:
    if len(value) > 1:
       print(values)
Prime Reaper
  • 176
  • 2