3

Can I use regular expressions in difflib?

Specifically, I'd like to do:

difflib.context_diff(actual, gold)

Where actual is:

[master 92a406f] file modified

and gold is:

\[master \w{7}\] file modified
Jonathan
  • 83
  • 1
  • 5
  • What are you trying to do? To me it looks like you just want to match the actual against the gold regular expression. Why would you want to use difflib for that? – Michael Jul 18 '11 at 15:20

3 Answers3

3

It looks like you mean that you want to ignore the 92a406f part of the actual file. You should write a scrubber that uses regexes to scrub the parts you want to ignore:

actual = re.sub(r"\[master \w{7}\]", "[master *******]", actual)

then store the scrubbed gold file. Then you can use standard difflib to compare the scrubbed actual to the scrubbed gold.

Ned Batchelder
  • 323,515
  • 67
  • 518
  • 625
3

If you really want to pursue a regex-based diff, then you can create your own string-like object that defines __eq__ based on regex matching, and use difflib on a sequence of those objects. I wouldn't recommend it, though.

Ned Batchelder
  • 323,515
  • 67
  • 518
  • 625
  • Hi, How do you manage the comparison as in the source code it only compares character per character and doesn't provide a range of character: `a[besti-1] == b[bestj-1]` and `a[besti+bestsize] == b[bestj+bestsize]`. – Julio Oct 07 '14 at 11:59
2

What I just did is: replace the find_longest_match function of difflib with a copy, but replace the == invocations by invocation of a check that when things are not equal try to interpret the left side as regexp (and returns true on any error, e.g. when it is not a valid regexp).

I am using it for unit tests expected output matching and so far it is working really fine.

PlasmaHH
  • 14,585
  • 5
  • 39
  • 55
  • 2
    Hi, I am interested by your solution. Is-it possible to edit your post with your patch? It seems that you changed the following lines: `a[besti-1] == b[bestj-1]` and `a[besti+bestsize] == b[bestj+bestsize]` but what about `self.b2j`? – Julio Oct 07 '14 at 11:53
  • @Julio: Unfortunately the patch resides at a company I no longer work with. But afaicr it was just the `==` replacements, I can not remember doing anything with b2j (which may or may not mean that mine has a bug, but it does what I needed those days) – PlasmaHH Oct 07 '14 at 12:11