23

I am trying to compare two text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python. Can anybody please give me a sample way to use this module.

When I try something like:

result = difflib.SequenceMatcher(None, testFile, comparisonFile)

I get an error saying object of type 'file' has no len.

kame
  • 16,824
  • 28
  • 95
  • 142
101010110101
  • 1,798
  • 7
  • 29
  • 41

5 Answers5

35

For starters, you need to pass strings to difflib.SequenceMatcher, not files:

# Like so
difflib.SequenceMatcher(None, str1, str2)

# Or just read the files in
difflib.SequenceMatcher(None, file1.read(), file2.read())

That'll fix your error anyway. To get the first non-matching string, I'll direct you to the wonderful world of difflib documentation.

Triptych
  • 188,472
  • 32
  • 145
  • 168
  • 12
    @OP: In addition to the docs, have a look at Doug Hellmann's excellent Python module-of-the-week difflib entry: http://blog.doughellmann.com/2007/10/pymotw-difflib.html – mechanical_meat Jun 10 '09 at 20:11
  • 2
    @BlackVegetable [link to the web archive project](https://web.archive.org/web/20130527085140/http://doughellmann.com/2007/10/pymotw-difflib.html) and [Python Module of the week link](https://pymotw.com/3/difflib/index.html) – BarathVutukuri Nov 29 '19 at 10:06
9

Here is a quick example of comparing the contents of two files using Python difflib...

import difflib

file1 = "myFile1.txt"
file2 = "myFile2.txt"

diff = difflib.ndiff(open(file1).readlines(),open(file2).readlines())
print ''.join(diff),
Vyke
  • 2,551
  • 1
  • 15
  • 5
  • 3
    How could we avoid to display lines that are the same ? I just want lines that differ to be printed. – JahMyst Jan 20 '16 at 22:13
  • 2
    @OlivierCervello import difflib, sys with open("a") as a: a_content = a.readlines() with open("b") as b: b_content = b.readlines() diff = difflib.unified_diff(a_content,b_content) print("***** Unified diff ************") print("Line no"+'\t'+'file1'+'\t'+'file2') for i,line in enumerate(diff): if line.startswith("-"): print(i,'\t\t'+line) elif line.startswith("+"): print(i,'\t\t\t\t\t\t'+line) ' – kishorebjv Mar 25 '16 at 12:15
5

Are you sure both files exist ?

Just tested it and i get a perfect result.

To get the results i use something like:

import difflib

diff=difflib.ndiff(open(testFile).readlines(), open(comparisonFile).readlines())

try:
    while 1:
        print diff.next(),
except:
    pass

the first character of each line indicates if they are different: eg.: '+' means the following line has been added, etc.

RSabet
  • 5,626
  • 3
  • 25
  • 26
  • oops, you're right silly mistake. But I'm still not sure how to get the data I need out of result. How do I even know if they differ or not? How can I get the first string that differs? Sorry lots of questions :( – 101010110101 Jun 10 '09 at 19:09
3

It sounds like you may not need difflib at all. If you're comparing line by line, try something like this:

test_lines = open("test.txt").readlines()
correct_lines = open("correct.txt").readlines()

for test, correct in zip(test_lines, correct_lines):
    if test != correct:
        print "Oh no! Expected %r; got %r." % (correct, test)
        break
else:
    len_diff = len(test_lines) - len(correct_lines)
    if len_diff > 0:
        print "Test file had too much data."
    elif len_diff < 0:
        print "Test file had too little data."
    else:
        print "Everything was correct!"
Filip Salomonsson
  • 739
  • 1
  • 4
  • 6
0

Another easier method to check whether two text files are same line by line. Try it out.

fname1 = 'text1.txt'
fname2 = 'text2.txt'

f1 = open(fname1)
f2 = open(fname2)

lines1 = f1.readlines()
lines2 = f2.readlines()
i = 0
f1.seek(0)
f2.seek(0)
for line1 in f1:
    if lines1[i] != lines2[i]:
        print(lines1[i])
        exit(0)
    i = i+1

print("both are equal")

f1.close()
f2.close()

otherwise, there is a predefined file in python in filecmp which you can use.

import filecmp

fname1 = 'text1.txt'
fname2 = 'text2.txt'

print(filecmp.cmp(fname1, fname2))

:)

biertje72
  • 65
  • 6