get_matching_blocks()
returns the longest contiguous matching subsequence. Here the longest matching subsequence is 'banana' in both the strings, with length 6. Hence it is returning 6.
Try this instead:
def similar(a,b):
c = 'something' # Initialize this to anything to make the while loop condition pass for the first time
sum = 0
while(len(c) != 1):
c = SequenceMatcher(lambda x: x == ' ',a.lower(),b.lower()).get_matching_blocks()
sizes = [i.size for i in c]
i = sizes.index(max(sizes))
sum += max(sizes)
a = a[0:c[i].a] + a[c[i].a + c[i].size:]
b = b[0:c[i].b] + b[c[i].b + c[i].size:]
return sum
This "subtracts" the matching part of the strings, and matches them again, until len(c)
is 1, which would happen when there are no more matches left.
However, this script doesn't ignore spaces. In order to do that, I used the suggestion from this other SO answer: just preprocess the strings before you pass them to the function like so:
a = 'Apple Banana'.replace(' ', '')
b = 'Banana Apple'.replace(' ', '')
You can include this part inside the function too.