How to match rows when one row contain string from another row?

Question

My aim is to find City that matches row from column general_text, but the match must be exact.

I was trying to use searching IN but it doesn't give me expected results, so I've tried to use str.contain but the way I try to do it shows me an error. Any hints on how to do it properly or efficient?

I have tried code based on Filtering out rows that have a string field contained in one of the rows of another column of strings

df['matched'] = df.apply(lambda x: x.City in x.general_text, axis=1)

but it gives me the result below:

data = [['palm springs john smith':'spring'],
    ['palm springs john smith':'palm springs'],
    ['palm springs john smith':'smith'],
    ['hamptons amagansett':'amagansett'],
    ['hamptons amagansett':'hampton'],
    ['hamptons amagansett':'gans'],
    ['edward riverwoods lake':'wood'],
    ['edward riverwoods lake':'riverwoods']]

df = pd.DataFrame(data, columns = [ 'general_text':'City'])

df['match'] = df.apply(lambda x: x['general_text'].str.contain(
                                          x.['City']), axis = 1)

What I would like to receive by the code above is match only this:

data = [['palm springs john smith':'palm springs'],
    ['hamptons amagansett':'amagansett'],
    ['edward riverwoods lake':'riverwoods']]

jezrael · Accepted Answer · 2019-09-16T05:32:37.323

3

You can use word boundaries \b\b for exact match:

import re

f = lambda x: bool(re.search(r'\b{}\b'.format(x['City']), x['general_text']))

Or:

f = lambda x: bool(re.findall(r'\b{}\b'.format(x['City']), x['general_text']))

df['match'] = df.apply(f, axis = 1)
print (df)
              general_text          City  match
0  palm springs john smith        spring  False
1  palm springs john smith  palm springs   True
2  palm springs john smith         smith   True
3      hamptons amagansett    amagansett   True
4      hamptons amagansett       hampton  False
5      hamptons amagansett          gans  False
6   edward riverwoods lake          wood  False
7   edward riverwoods lake    riverwoods   True

edited Sep 16 '19 at 05:32

answered Sep 16 '19 at 05:24

jezrael

629,482
62
918
895

Both works perfectly fine and give the same result but what is the difference between `re.search` and `re.findall`? – KWar Sep 16 '19 at 08:28
@KWar -You can check [this](https://teamtreehouse.com/community/what-is-the-difference-between-research-and-refindall-what-is-the-meaning-of-reverbose) - here is working same, because if use bool with `None` for `re.search` get `False` and if use bool with empty lists `[]` for `re.findall` also get False – jezrael Sep 16 '19 at 08:31

How to match rows when one row contain string from another row?

1 Answers1

Linked