Python - getting just the difference between strings

Question

What's the best way of getting just the difference from two multiline strings?

a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

diff = difflib.ndiff(a,b)
print ''.join(diff)

This produces:

  t  e  s  t  i  n  g     t  h  i  s     i  s     w  o  r  k  i  n  g     
     t  e  s  t  i  n  g     t  h  i  s     i  s     w  o  r  k  i  n  g     1     
+  + t+ e+ s+ t+ i+ n+ g+  + t+ h+ i+ s+  + i+ s+  + w+ o+ r+ k+ i+ n+ g+  + 2

What's the best way of getting exactly:

testing this is working 2?

Would regex be the solution here?

@Chris_Rands nice hack but that's not a performant way to do it — rachid el kedmiri, Sep 27 '17 at 16:53
What is the point of using `split`? Why not just `b.replace(a, '')`? — ekhumoro, Sep 27 '17 at 17:07

Kaushik NP · Answer 1 · 2017-09-27T17:23:39.100

5

The easiest Hack, credits @Chris, by using split().

Note : you need to determine which is the longer string, and use that for split.

if len(a)>len(b): 
   res=''.join(a.split(b))             #get diff
else: 
   res=''.join(b.split(a))             #get diff

print(res.strip())                     #remove whitespace on either sides

# driver values

IN : a = 'testing this is working \n testing this is working 1 \n' 
IN : b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

OUT : testing this is working 2

EDIT : thanks to @ekhumoro for another hack using replace, with no need for any of the join computation required.

if len(a)>len(b): 
    res=a.replace(b,'')             #get diff
else: 
    res=b.replace(a,'')             #get diff

edited Sep 27 '17 at 17:23

answered Sep 27 '17 at 16:55

Kaushik NP

6,188
8
29
57

2

`b.replace(a, '')` is simpler, faster, and makes more sense. – ekhumoro Sep 27 '17 at 17:16
Haha, love this. Another good hack. I never thought of using `split` or `replace` in such ways. Thank you @ekhumoro!! – Kaushik NP Sep 27 '17 at 17:19

score 5 · Accepted Answer · answered Sep 27 '17 at 17:02

a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

splitA = set(a.split("\n"))
splitB = set(b.split("\n"))

diff = splitB.difference(splitA)
diff = ", ".join(diff)  # ' testing this is working 2, more things if there were...'

Essentially making each string a set of lines, and taking the set difference - i.e. All things in B that are not in A. Then taking that result and joining it all into one string.

Edit: This is a conveluded way of saying what @ShreyasG said - [x for x if x not in y]...

score 5 · Answer 3 · answered Sep 27 '17 at 17:16

This is basically @Godron629's answer, but since I can't comment, I'm posting it here with a slight modification: changing difference for symmetric_difference so that the order of the sets doesn't matter.

a = 'testing this is working \n testing this is working 1 \n'
b = 'testing this is working \n testing this is working 1 \n testing this is working 2'

splitA = set(a.split("\n"))
splitB = set(b.split("\n"))

diff = splitB.symmetric_difference(splitA)
diff = ", ".join(diff)  # ' testing this is working 2, some more things...'

score 0 · Answer 4 · answered Sep 27 '17 at 16:54

Building on @Chris_Rands comment, you can use the splitlines() operation too (if your strings are multi-lines and you want the line not present in one but the other):

b_s = b.splitlines()
a_s = a.splitlines()
[x for x in b_s if x not in a_s]

Expected output is:

[' testing this is working 2']

pylang · Answer 5 · 2017-09-27T17:05:49.087

0

import itertools as it


"".join(y for x, y in it.zip_longest(a, b) if x != y)
# ' testing this is working 2'

Alternatively

import collections as ct


ca = ct.Counter(a.split("\n"))
cb = ct.Counter(b.split("\n"))

diff = cb - ca
"".join(diff.keys())

edited Sep 27 '17 at 17:05

answered Sep 27 '17 at 17:00

pylang

28,402
9
97
94

score 0 · Answer 6 · answered Jan 02 '19 at 23:19

You could use the following function:

def __slave(a, b):

    for i, l_a in enumerate(a):
        if b == l_a:
            return i
    return -1

def diff(a, b):

    t_b = b
    c_i = 0
    for c in a:

        t_i = __slave(t_b, c)
        if t_i != -1 and (t_i > c_i or t_i == c_i):
            c_i = t_i
            t_b = t_b[:c_i] + t_b[c_i+1:]

    t_a = a
    c_i = 0
    for c in b:

        t_i = __slave(t_a, c)
        if t_i != -1 and (t_i > c_i or t_i == c_i):
            c_i = t_i
            t_a = t_a[:c_i] + t_a[c_i+1:]

    return t_b + t_a

Usage sample print diff(a, b)

Python - getting just the difference between strings

6 Answers6

Linked