I want to find the most similar value from a dataframe column to a specified string , e.g. a='book'
. Let's say the dataframe looks like: df
col1
wijk 00 book
Wijk a
test
Now I want to return wijk 00 book
since this is the most similar to a
. I am trying to do this with the fuzzywuzzy
package.
Therefore, I have a dataframe A
with the values I want to have a similar one for. Then I use:
A['similar_value'] = A.col1.apply(lambda x: [process.extract(x, df.col1, limit=1)][0][0][0])
But when comparing a lot of strings, this takes too much time. Does anyone knows how to do this quickly?