Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

56 questions
19
votes
2 answers

How does Pythons SequenceMatcher work?

I am a little puzzled by two different answers returned by SequenceMatcher depending on the order of the arguments. Why is it so? Example SequenceMatcher is not commutative: >>> from difflib import SequenceMatcher >>> SequenceMatcher(None, "Ebojfm…
user2399453
  • 2,378
  • 2
  • 22
  • 49
7
votes
1 answer

Difflib's SequenceMatcher - Customized equality

I've been trying to create a nested or recursive effect with SequenceMatcher. The final goal is comparing two sequences, both may contain instances of different types. For example, the sequences could be: l1 = [1, "Foo", "Bar", 3] l2 = [1, "Fo",…
YaronK
  • 742
  • 1
  • 6
  • 14
6
votes
2 answers

making difflib's SequenceMatcher ignore "junk" characters

I have a lot of strings that i want to match for similarity(each string is 30 characters on average). I found difflib's SequenceMatcher great for this task as it was simple and found the results good. But if i compare hellboy and hell-boy like…
lovesh
  • 4,813
  • 7
  • 55
  • 89
5
votes
2 answers

SequenceMatcher - finding the two most similar elements of two or more lists of data

I was trying to compare a set of strings to an already defined set of strings. For example, you want to find the addressee of a letter, which text is digitalized via OCR. There is an array of adresses, which has dictionaries as elements. Each…
valerius21
  • 301
  • 1
  • 12
5
votes
3 answers

Getting error while using fuzzywuzzy: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning

I am getting below error. Is there any way to fix it without installing python-Levenshtein and if not then how to install python-Levenshtein on linux. UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this…
Rishi Bansal
  • 2,547
  • 2
  • 19
  • 36
5
votes
2 answers

difflib.SequenceMatcher isjunk argument not considered?

In the python difflib library, is the SequenceMatcher class behaving unexpectedly, or am I misreading what the supposed behavior is? Why does the isjunk argument seem to not make any difference in this case? difflib.SequenceMatcher(None, "AA", "A…
bluelogic
  • 51
  • 3
4
votes
2 answers

SequenceMatcher for multiple inputs, not just two?

wondering about the best way to approach this particular problem and if any libraries (python preferably, but I can be flexible if need be). I have a file with a string on each line. I would like to find the longest common patterns and their…
Peck
  • 782
  • 1
  • 7
  • 26
4
votes
3 answers

Determine where documents differ with Python

I have been using the Python difflib library to find where 2 documents differ. The Differ().compare() method does this, but it is very slow - atleast 100x slower for large HTML documents compared to the diff command. How can I efficiently determine…
hoju
  • 24,959
  • 33
  • 122
  • 169
3
votes
1 answer

How does Python 3.6 SequenceMatcher().get_matching_blocks() work?

I am trying to use SequenceMatcher.ratio() to get the similarity of two strings: "86418648" and "86488648": >>> SequenceMatcher(None,"86418648","86488648").ratio() 0.5 The ratio returned is 0.5, which is much lower than I expected because there is…
Jessie
  • 31
  • 4
3
votes
5 answers

Comparing two columns of a csv and outputting string similarity ratio in another csv

I am very new to python programming. I am trying to take a csv file that has two columns of string values and want to compare the similarity ratio of the string between both columns. Then I want to take the values and output the ratio in another…
Jimmy
  • 33
  • 1
  • 5
3
votes
1 answer

Python: Passing SequenceMatcher in difflib an "autojunk=False" flag yields error

I am trying to use the SequenceMatcher method in Python's difflib package to identify string similarity. I have experienced strange behavior with the method, though, and I believe my problem may be related to the package's "junk" filter, a problem…
duhaime
  • 19,699
  • 8
  • 122
  • 154
2
votes
1 answer

How to compare each array in a set of binary arrays to an array that is outside the set

I have a set of arrays. I also have a separate array (T) to compare each array in the set to. I've tried to use SequenceMatcher to do this but can't figure out how to loop it so that each array from the set gets compared to T. This is for a fitness…
badam
  • 55
  • 4
2
votes
2 answers

Is there an equivalent to pythons's SequenceMatcher in SQL Server to join on columns that are similar?

In python there a nice built in function that lets me check the difference between the sequence of two strings. Example below: from difflib import SequenceMatcher def similar(a, b): return SequenceMatcher(None, a,…
Martin Bobak
  • 1,763
  • 1
  • 19
  • 39
2
votes
1 answer

Python Comparing text files for similar or equal lines

I have 2 text files, my goal is to find the lines in file First.txt that are not in Second.txt and output said lines to a third text file Missing.txt, i have that done: fn = "Missing.txt" try: fileOutPut = open(fn, 'w') except IOError: …
Fidycent
  • 21
  • 4
2
votes
1 answer

Working of methods set_seq1 and set_seq2 , difflib python

I have checked the docs of difflib and i'm confused on how difflib.SequenceMatcher.ratio() actually works. Consider this : s = difflib.SequenceMatcher(None, "hey here" , "hey there").ratio() print s gives s = 0.9411764705882353 I wanted to know…
Hypothetical Ninja
  • 3,054
  • 9
  • 39
  • 64
1
2 3 4