0

I am trying to compare two sequences using difflib.Differ(). However, I am observing some unwanted differences which I am not able to understand. Can someone please explain this behavior and how this can be resolved?

import difflib

a = "abc-123 Abcdef"
b = "abc-123 Abcdef-def"
a = a.strip("\n")
b = b.strip("\n")
a = a.split(" ")
b = b.split(" ")

d = difflib.Differ()
result = list(d.compare(a,b))
for s in result:
    if s[0] == ' ':
        continue
    print s

Output:

- Abcdef
+ Abcdef-def
?       ++++

Why is the ? difference reported here? I would expect only first two differences to be reported (changes only).

sarbjit
  • 3,262
  • 9
  • 30
  • 52

1 Answers1

2

From the documentation:

Lines beginning with ‘?‘ attempt to guide the eye to intraline differences, and were not present in either input sequence.

Meaning it's just a way to mark where the difference is, it's not actually another difference.

https://docs.python.org/2/library/difflib.html

Eran
  • 2,054
  • 3
  • 22
  • 27
  • If I change the sequence as `a = "abc-123 def" , b = "abc-123 def-def"`, then I see only two differences i.e. `- def + def-def`. This is confusing. Is it safe to ignore differences starting with `?` ? – sarbjit Feb 11 '15 at 11:43
  • @sarbjit, it is safe to ignore in the sense that if it's reported, the actual differences (-,+ lines) are also reported. but of course it carries extra information which you wouldn't get otherwise (where in the line was the difference). It doesn't occur in your other example because it is only shown when the comparison thinks these lines are the same with minor variation and it thinks it can show you where in the line they differ (what it considers as such, is dependant on its own algorithm and is beyond me). – Eran Feb 11 '15 at 11:54