I have two dataframes df1
and df2
. df1
has a column called 'comments'
that contains a string. df2
has a column called 'labels'
that contains smaller strings. I am trying to write a function that searches df1['comments']
for the strings contained in df2['labels']
and creates a new variable for d1
called df1['match']
that is True
if df1['comments']
contains any of the strings in df2['labels']
and False
if df1['comments']
does not contain any of the strings in df2['labels']
.
I'm trying to use df.str.contains('word', na=False)
to solve this problem and I have managed to create the column df1['match']
searching for one specific string using the following function:
df1['match'] = df1['comment'].str.contains('mystring', na=False)
However, I struggle to write a function that iterates over all the words in df2['label']
and creates a df1['match']
with True
if any of the words in df2['label']
are present and False
otherwise.
This is my attempt at writing the loop:
for comment in df1['comment']:
for word in df2['label']:
if df1['comment'].str.contains(word, na=False)=True:
df1['match']=True
#(would need something to continue to next comment if there is a match)
else:
df1['match']=False #(put value as false if there none of the items in df2['label' is contained in df1['comment']``
Any help would be greately appreciated.