Questions tagged [sequencematcher]

For questions pertaining to SequenceMatcher from the python difflib module. This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. difflib is part of the python standard library.

Documentation

56 questions
1
vote
2 answers

SequenceMatcher: Recording no match just once?

I am using SequenceMatcher to find a set of words within a group of texts. The problem I am having is that I need to record when it does not find a match, but one time per text. If I try an if statement, it gives me a result each time the comparison…
Connie
  • 13
  • 2
1
vote
3 answers

How to detect sequences in a interleaved log file

I would like to match patterns from a given pattern library, returning the longest detected patterns. However I only have the interleaved result of multiple parallel tasks in a log file, e.g. from multiple cores of a processor. Is this a known…
1
vote
1 answer

Drop similar text rows of one column in Python

import pandas as pd from difflib import SequenceMatcher df = pd.DataFrame({"id":[9,12,13,14], "text":["Error number 609 at line 10", "Error number 609 at line 22", "Error string 'foo' at line 11", "Error string 'bar' at line…
ah bon
  • 5,121
  • 5
  • 26
  • 65
1
vote
3 answers

By how much percentage do the two strings match?

I have 2 columns of disease names, I have to try and match the best options. I tried using "SequenceMatcher" module and "fuzzywuzzy" module in python and the results were surprising. I have pasted the results and my doubts below: Consider there is a…
1
vote
2 answers

Find match percentage between two strings also taking intro consideration the order of the words - Python

I am looking for a way to output the match percentage while between two strings (ex: names) while also taking into consideration they might be the same but with the words in a different order. I tried using SequenceMatcher() but the results are…
1
vote
0 answers

Sequence clustering in R

I'm trying to write a simple R sequence clustering/grouping/simplification solution. I'm rather a beginner, haven't used R for a while, so please forgive simple and stupid questions/solutions. Tasks are taken from SAP and they represent execution of…
1
vote
1 answer

How to delete invalid characters between multiple strings in python?

I'm working in a project with OCR in Spanish. The camera captures different frames in a line of text. The line of text contains this: Este texto, es una prueba del dispositivo lector para no videntes. After some operations I get strings like…
Alex Ortega
  • 45
  • 10
1
vote
0 answers

Custom items for list alignment with SequenceMatcher

I am using SequenceMatcher for aligning two lists. Each lists' item is either tuple or integer. The requirement is, for a tuple that contains a particular integer is considered as equal. For example: (1, 2, 3) == 1 #True (1, 2, 3) == 2 #True To do…
jalal
  • 73
  • 1
  • 6
0
votes
1 answer

Does SequenceMatcher is supported by chaquopy

does chaquopy support from difflib import SequenceMatcher or pip will be install first and what pip will be used to use the SequenceMatcher
0
votes
1 answer

How i match with best ratio of SequenceMatcher

I use the SequenceMatcher ratio to match two dataframe with the best ratio. I want to check first if the score A and AA is good then check if the score between B is BB is good then if the score between C and CC is good, then I add the line …
0
votes
0 answers

fuzzy wuzzy token sort vs difflib Sequence matcher

I am trying to figure out the difference between the two. I get the same results(similarity scores) using the two for the same strings. Can somebody please explain the difference between the two using the formula for each of them? Any idea if one…
0
votes
2 answers

Find common fragments in multiple strings using SequenceMatcher

I would like to find common string between: strings_list = ['PS1 123456 Test', 'PS1 758922 Test', 'PS1 978242 Test'] The following code returns only the first part "PS1 1", I would imagine the result is "PS1 Test". Could you help me, is it possible…
Elka
  • 3
  • 2
0
votes
1 answer

Similarity ratio from a list of excluded strings

In comparing the similarity of 2 strings, I want to exclude a list of strings, for example, ignore 'Texas', and 'US'. I tried to use the argument 'isjunk' in Difflib's SequenceMatcher: exclusion = ['Texas', 'US'] sr = SequenceMatcher(lambda x: x in…
Mark K
  • 6,967
  • 13
  • 41
  • 87
0
votes
0 answers

Comparing strings in python with tools as SequenceMatcher and textdistance and the difference in their algorithms

I am working with a dataframe which has 2 columns of city names which should be equal. But they are not due to administrative errors, spelling mistakes or name changes. I am trying to see when those city names are 'equal enough' to be assumed equal.…
Hestaron
  • 75
  • 8
0
votes
0 answers

Is there any solution when summing certain cell values have extra values

I am using python3 with package pandas and SequenceMatcher for output the value of A.OUT-B.IN and A.OUT-C.IN. When the program calculates the value of the last two rows that always shows ValueError: Must have equal len keys and value when setting…
Johann
  • 69
  • 7