11

Using Python, I'd like to output the difference between two strings as a unified diff (-u) while, optionally, ignoring blank lines (-B) and spaces (-w).

Since the strings were generated internally, I'd prefer to not deal with nuanced complexity of writing one or both strings to a file, running GNU diff, fixing up the output, and finally cleaning up.

While difflib.unified_diff generates unified diffs it doesn't seem to let me tweak how spaces and blank lines are handled. I've looked at its implementation and, I suspect, the only solution is to copy/hack that function's body.

Is there anything better?

For the moment I'm stripping the pad characters using something like:

import difflib
import re
import sys

l = "line 1\nline 2\nline 3\n"
r = "\nline 1\n\nline 2\nline3\n"
strip_spaces = True
strip_blank_lines = True

if strip_spaces:
    l = re.sub(r"[ \t]+", r"", l)
    r = re.sub(r"[ \t]+", r"", r)
if strip_blank_lines:
    l = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", l))
    r = re.sub(r"^\n", r"", re.sub(r"\n+", r"\n", r))
# run diff
diff = difflib.unified_diff(l.splitlines(keepends=True), r.splitlines(keepends=True))
sys.stdout.writelines(list(diff))

which, of course, results in the output for a diff of something something other than the original input. For instance, pass the above text to GNU diff 3.3 run as "diff -u -w" and "line 3" is displayed as part of the context, the above would display "line3".

cagney
  • 408
  • 3
  • 11
  • "which, of course, results in diffs for something other than the original input." Sure, but that's what diff does, right? OTOH, I'm sure that diff replaces whitespace with a single blank rather than no blanks... – Patrick Maupin Jul 31 '15 at 20:17
  • GNU diff 3.3 describes -w thus: "The `--ignore-all-space' (`-w') option is stronger still. It ignores differences even if one line has white space where the other line has none. ..." – cagney Aug 01 '15 at 00:51
  • 1
    @patrick No, diff uses the original input when displaying the context (and that includes things like correct line numbers), not something mangled beyond belief – cagney Aug 01 '15 at 01:02
  • Ah, display. I was thinking about the compare. I suppose you could keep track of line numbers out of diff and display the original, but at that point, you're right -- it probably makes more sense to fix difflib if it doesn't do that. – Patrick Maupin Aug 01 '15 at 01:47
  • And you're right, I was thinking of -b, not -w – Patrick Maupin Aug 01 '15 at 01:59
  • NP; GNU diff has too many options. – cagney Aug 04 '15 at 17:53
  • The code for unified_diff is python and not very long. The problem is that the comparisons are actually done by SequenceMatcher (also Python code but hurts my eyes to read). You can try to address it after the fact but it's easier said than done. – woot Oct 27 '15 at 05:31
  • Cant you just use GNU diff? – Andrea Corbellini Dec 25 '15 at 14:55
  • @woot you have a good point, I noticed that GNU diff still displays some "ignored" white space changes when they appear in the context of a non-white space change. It might be possible to just prune them after the event. – cagney Jan 27 '16 at 17:35

1 Answers1

1

Make Your own SequenceMatcher, copy unified_diff body and replace SequenceMatcher with Your own matcher.

Tomasz Jakub Rup
  • 9,464
  • 7
  • 44
  • 47