0

I have a dataframe (df) with a "description" column. I would like to extract all those rows from this dataframe by identifying partial matches from a list(mylist).

df
------------------
id   description
 111    abcxyz
 212    ab10yz
 203    abcdd9
 442    ab00-z
 554    a12x0z
 697    a9901z


mylist: ['ab','yz']

There are similar questions but mostly focus on either full matching of the list items to dataframe column.

I am interested in matching items from mylist with the description column of the dataframe and return those rows as a dataframe where a match is found.

Expected result as a dataframe:

------------------
id   description
111    abcxyz
212    ab10yz
203    abcdd9
442    ab00-z

I have tried different solutions. Here I will mention two of these as the following:

df[df.description.str.contains('|'.join(mylist))]

df[df['description'].str.contains(mylist)]

The first line above resulted in:

   KeyError: '[nan nan nan ... nan nan nan] not in index'

The second line to code results in:

   TypeError: unhashable type: 'list'
Hanif
  • 267
  • 2
  • 14

1 Answers1

0

You can use regex, "DataFrame.str.contains" already can do that:

pt = '.*?({}).*?'.format('|'.join(mlist))
df[df['description'].str.contains(pt, regex= True)]
SEDaradji
  • 995
  • 10
  • 18