-4

I am comparing string similarity between 2 lists of strings and whenever the similarity function is >0 I loose the word2 value:

list1=['aaaa','cccc','bb']
list2=['aaa','fff','v']
for word1 in list1:
        for word2 in list2:
            if (similar(word1 ,word2)>0):
                                    print(word2)

similar is Sequence Matcher:

def similar(a, b):
    s= SequenceMatcher(a,b).ratio()
    s=round(s*100,1)
    return s

If the 'similar' function is>0 then my word2 becomes ''. If i check for similar(word1,word2)==0 then my value stays right.

bruno desthuilliers
  • 68,994
  • 6
  • 72
  • 93
theo
  • 15
  • 1
  • format & fix your code. what is in "similar" ?? – Jean-François Fabre Mar 27 '18 at 11:56
  • @Aran-Fey: even with formatting fixed this isn't valid python (`word2in`) – Jean-François Fabre Mar 27 '18 at 11:56
  • 1
    I know, but at least I can read it... Now that I can actually comprehend the code, I'm more comfortable casting a close vote. – Aran-Fey Mar 27 '18 at 11:56
  • 2
    Please provide a [mcve]. Describe what is input, current output and expected output. – user202729 Mar 27 '18 at 11:57
  • and fix your code since it's not valid: `for word2in set(list2)`. That's why we're asking for a [mcve]. People aren't trusting this version of your code – Jean-François Fabre Mar 27 '18 at 11:58
  • What does it mean to "loose the word2 value", and what does "my value stays right"? Without defining what's actually happening, and what you expect to have happen, it's impossible for anyone to answer this question. – Daniel Pryden Mar 27 '18 at 12:03
  • 2
    If `word2` is a `str` object then it's impossible that calling a function with `word2` as an argument will change the value of `word2`: `str` objects are immutable and functions can't rebind names in the caller's context. If `word2` is not a `str` object, you need to explain what kind of object it is. – Daniel Pryden Mar 27 '18 at 12:03
  • it's a string, the 2 lists containg only string elements – theo Mar 27 '18 at 12:06
  • 1
    @theo: Then I don't believe that your code is giving you the result you say it is. You need to show enough code so that I or someone else can use it to reproduce your result, or no one will be able to help you. – Daniel Pryden Mar 27 '18 at 12:08
  • :) ok, this is my first post, sorry. I've pasted the similar function and a sample of 2 lists. You should be able to replicate now – theo Mar 27 '18 at 12:10
  • I don't have a class named `SequenceMatcher`. Where does that come from? – Daniel Pryden Mar 27 '18 at 12:10
  • from difflib import SequenceMatcher – theo Mar 27 '18 at 12:12
  • 1
    How in the world did this incomprehensible mess of a question get 3 upvotes? Do I smell sock puppetry? – Aran-Fey Mar 27 '18 at 12:16
  • I've also tried to use an auxiliar value before calculating similarity, just because i didn't look at how SequenceMatcher is built, but that didn't help. I tried the same code with fuzz.ratio instead of SequenceMatcher and my word2 doesn't become '' and works as needed. – theo Mar 27 '18 at 12:16
  • @theo: once for all, there is __no way__ what you describe can happen - strings are immutable and a function cannot rebind names in it's caller's namespace. Also both Daniel Pryden and myself tested your code (making sure to print `word1` and `word2` after the `similar()` call in my case) and, __of course__, `word2` is left unchanged by the call. I don't know where you got that idea from - that `word2` would become an empty string after the call - but it's just NOT happening. Period. – bruno desthuilliers Mar 27 '18 at 12:44

1 Answers1

0

Here is what I understand your code to be:

from difflib import SequenceMatcher

def similar(a, b):
    s = SequenceMatcher(a, b).ratio()
    s = round(s * 100, 1)
    return s


list1=['aaaa','cccc','bb']
list2=['aaa','fff','v']

for word1 in list1:
    for word2 in list2:
        if similar(word1, word2) > 0:
            print(word2)

Running this code doesn't print anything, because your similar function returns 0 every time.

If I print the return value of similar, I get this output:

Comparing "aaaa" and "aaa", got 0.0
Comparing "aaaa" and "fff", got 0.0
Comparing "aaaa" and "v", got 0.0
Comparing "cccc" and "aaa", got 0.0
Comparing "cccc" and "fff", got 0.0
Comparing "cccc" and "v", got 0.0
Comparing "bb" and "aaa", got 0.0
Comparing "bb" and "fff", got 0.0
Comparing "bb" and "v", got 0.0

However, if I print the values of the lists afterward, none of your strings have been touched:

print('Loop finished, contents of lists:')
print('list1: {}'.format(repr(list1)))
print('list2: {}'.format(repr(list2)))
Loop finished, contents of lists:
list1: ['aaaa', 'cccc', 'bb']
list2: ['aaa', 'fff', 'v']

Demo link: https://ideone.com/0Kuikm

Daniel Pryden
  • 54,536
  • 12
  • 88
  • 131
  • Thanks for the answers. After some more research, i found the root cause here: https://stackoverflow.com/questions/4802137/how-to-use-sequencematcher-to-find-similarity-between-two-strings The SequenceMatcher class has this constructor: class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True) When using SequenceMatcher(a,b) , a was the value for isjunk and b as value for a, leaving the default '' value for b. – theo Apr 04 '18 at 08:46