0

I have 2 dataframes df1 and df2.

I would like to get all rows in df1 that has exact string match in column B of df2 This is df1:

df1={"columnA":['apple,cherry','pineple,lemon','banana, pear','cherry, pear, lemon']} 
df1=pd.DataFrame(df1)

This is df2:

df2={"columnB":['apple','cherry']}
df2=pd.DataFrame(df2)

Below code output incorrect result:

df1[df1['columnA'].str.contains('|'.join(df2['columnB'].values))]

enter image description here

Pineapple is not supposed to appear as this is not exact match.

How can i get result like this:

enter image description here

CCCCos
  • 57
  • 5

3 Answers3

2

Without actual reproducible code it's harder to help you, but I think this should work:

words = [rf'\b{string}\b' for string in df2.columnB]
df1[df1['columnA'].str.contains('|'.join(words))]
Juan C
  • 3,992
  • 1
  • 10
  • 33
0
df1={"columnA":['apple,cherry','pineple,lemon','banana, pear','cherry, pear, lemon']} 
df1=pd.DataFrame(df1)
    
df2={"columnB":['apple','cherry']}
df2=pd.DataFrame(df2)

Larger way of doing it ,but correct and simpler

list1=[]
for i in range(0,len(df1)):
    for j in range(0,len(df2)):
        if(df2["columnB"][j] in df1["columnA"][i]):
            list1.append(i)
            break
df=df1.loc[list1]

   

Answer

ColumnA
0   apple,cherry
3   cherry, pear, lemon   
Zesty Dragon
  • 477
  • 2
  • 17
-1

You were very close, but you will need to apply the word-operator of regex:

df1[df1['columnA'].str.contains("\b(" + '|'.join(df2['columnB'].values) + ")\b")]

This will look for the complete words.

JarroVGIT
  • 1,458
  • 1
  • 9
  • 13