0

I have this dataframe-

     Name  Age
0    Alex   10
1     NaN   12
2  Clarke   13
3    Lexy   14
4   Marie   10

I need to subset the dataframe based on Name column. I want to only keep rows having name like %lex% or %clarke% or %bob% (case-insensitive). I have tried df.loc and df.query but it's way too long. Is there a better way to do this?

What I've tried so far-

df.loc[df['Name'].str.contains('lex', na=False, case=False) | df['Name'].str.contains('clarke', na=False, case=False)]
df.query('Name.str.contains("lex", na=False, case=False) or Name.str.contains("clarke", na=False, case=False)', engine='python')

Minimum reproducible code-

import pandas as pd
import numpy as np
data = [['Alex', 10], [np.nan, 12], ['Clarke', 13], ['Lexy', 14], ['Marie', 10]]
df = pd.DataFrame(data, columns=['Name', 'Age'])

Expected output-

     Name  Age
0    Alex   10
2  Clarke   13
3    Lexy   14
Shradha
  • 1,468
  • 1
  • 5
  • 22

1 Answers1

1

So you can do

out = df[df['Name'].str.contains('lex|clarke', na=False, case=False)]
Out[5]: 
     Name  Age
0    Alex   10
2  Clarke   13
3    Lexy   14
BENY
  • 258,262
  • 17
  • 121
  • 165