Questions tagged [difflib]

A python module, provides tools for computing and working with differences between sequences, especially useful for comparing text. Includes functions that produce reports using several common difference formats.

A python module which provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce difference information in various formats, including HTML and context and unified diffs.

271 questions
148
votes
2 answers

High performance fuzzy string comparison in Python, use Levenshtein or difflib

I am doing clinical message normalization (spell check) in which I check each given word against 900,000 word medical dictionary. I am more concern about the time complexity/performance. I want to do fuzzy string comparison, but I'm not sure which…
Maggie
  • 5,331
  • 8
  • 38
  • 54
33
votes
2 answers

How to use SequenceMatcher to find similarity between two strings?

import difflib a='abcd' b='ab123' seq=difflib.SequenceMatcher(a=a.lower(),b=b.lower()) seq=difflib.SequenceMatcher(a,b) d=seq.ratio()*100 print d I used the above code but obtained output is 0.0. How can I get a valid answer?
joolie
  • 381
  • 1
  • 5
  • 7
30
votes
6 answers

Generating and applying diffs in python

Is there an 'out-of-the-box' way in python to generate a list of differences between two texts, and then applying this diff to one file to obtain the other, later? I want to keep the revision history of a text, but I don't want to save the entire…
noio
  • 5,435
  • 7
  • 37
  • 61
23
votes
5 answers

Comparing two .txt files using difflib in Python

I am trying to compare two text files and output the first string in the comparison file that does not match but am having difficulty since I am very new to python. Can anybody please give me a sample way to use this module. When I try something…
101010110101
  • 1,798
  • 7
  • 29
  • 41
20
votes
2 answers

python difflib comparing files

I am trying to use difflib to produce diff for two text files containing tweets. Here is the code: #!/usr/bin/env python # difflib_test import difflib file1 = open('/home/saad/Code/test/new_tweets', 'r') file2 = open('/home/saad/PTITVProgs',…
koogee
  • 823
  • 3
  • 10
  • 24
14
votes
3 answers

How to use Python's difflib to produce side-by-side comparison of two files similar to Unix sdiff command?

I am using Python 2.6 and I want to create a simple GUI with two side-by-side text panes comparing two text files (file1.txt & file2.txt) . I am using difflib but it is not clear for me how to produce a result similar to the sdiff Unix command.…
zml
  • 507
  • 5
  • 14
13
votes
3 answers

Ignore case with difflib.get_close_matches()

How can I tell difflib.get_close_matches() to ignore case? I have a dictionary which has a defined format which includes capitalisation. However, the test string might have full capitalisation or no capitalisation, and these should be equivalent.…
rudivonstaden
  • 6,405
  • 4
  • 21
  • 36
11
votes
4 answers

sequence matching algorithm in python

I have a list of sentences such as this: errList = [ 'Ragu ate lunch but didnt have Water for drinks', 'Rams ate lunch but didnt have Gatorade for drinks', 'Saya ate lunch but didnt have :water for drinks', 'Raghu…
NullException
  • 3,281
  • 7
  • 20
  • 40
11
votes
1 answer

"diff -u -B -w" in python?

Using Python, I'd like to output the difference between two strings as a unified diff (-u) while, optionally, ignoring blank lines (-B) and spaces (-w). Since the strings were generated internally, I'd prefer to not deal with nuanced complexity of…
cagney
  • 408
  • 3
  • 11
10
votes
3 answers

ignore spaces when comparing strings in python

I am using difflib python package. No matter whether I set isjunk argument, the calculated ratios are the same. Isn't the difference of spaces ignored when isjunk is lambda x: x == " "? In [193]: difflib.SequenceMatcher(isjunk=lambda x: x == " ",…
RNA
  • 126,288
  • 12
  • 45
  • 61
8
votes
1 answer

Is there an alternative to `difflib.get_close_matches()` that returns indexes (list positions) instead of a str list?

I want to use something like difflib.get_close_matches but instead of the most similar strings, I would like to obtain the indexes (i.e. position in the list). The indexes of the list are more flexible because one can relate the index to other data…
toto_tico
  • 13,917
  • 7
  • 74
  • 93
8
votes
6 answers

Python - getting just the difference between strings

What's the best way of getting just the difference from two multiline strings? a = 'testing this is working \n testing this is working 1 \n' b = 'testing this is working \n testing this is working 1 \n testing this is working 2' diff =…
Rekovni
  • 3,672
  • 3
  • 26
  • 46
8
votes
3 answers

How does the python difflib.get_close_matches() function work?

The following are two arrays: import difflib import scipy import numpy a1=numpy.array(['198.129.254.73','134.55.221.58','134.55.219.121','134.55.41.41','198.124.252.101'],…
Dexters
  • 2,225
  • 3
  • 29
  • 50
7
votes
1 answer

Python Difflib's SequenceMatcher does not find Longest Common Substrings

I want to use difflib.SequenceMatcher to extract longest common substrings from two strings. I'm not sure whether I found a bug or misunderstood the documentation of find_longest_match. This is the point that I find confusing: In other words, of…
Lukas Barth
  • 2,078
  • 11
  • 36
7
votes
1 answer

Difflib's SequenceMatcher - Customized equality

I've been trying to create a nested or recursive effect with SequenceMatcher. The final goal is comparing two sequences, both may contain instances of different types. For example, the sequences could be: l1 = [1, "Foo", "Bar", 3] l2 = [1, "Fo",…
YaronK
  • 742
  • 1
  • 6
  • 14
1
2 3
18 19