0

I have a Dataframe like this,

col1  col2  col3  col4  col5  col6
abc    bc   eg     egg   123    NA
frog   dog  fox    cat   ac     aa
12     7    87     ch    25      1
bc     79   09     123   NA     89    
...
...

I want to select all columns with contains a specific string to get a subset for the dataframe.

For example, I want all rows contains 'bc' I know how to select it from one column,

df.loc[df.col1.str.contains('bc', na=False)]

But how can I get the data from all columns at once? Because my original columns are more than 200+.

I tried to use,

for c, dtype in zip(df.columns, df.dtypes):
...     if dtype == np.object:
...             df = df.loc[df[c].str.contains("bc",na = False)]

But it only returns all column names.

The final results should be like,

col1  col2  col3  col4  col5  col 6
abc    bc   efg    egg   123    NA
bc     79   09     123   NA     89    
...
...

All I want is a subset of all rows of the original Dataframe contains 'bc'.

Jiayu Zhang
  • 669
  • 3
  • 14
  • Just call `str.contains` inside `apply`. See the answer in the dupe. – cs95 Jul 01 '19 at 15:35
  • Hi @cs95 , thanks for your help. I have edited my question. This one is not a duplicated question. – Jiayu Zhang Jul 01 '19 at 16:46
  • Did you check apply and str.contains? Why does it not work for you? `df.astype(str).apply(lambda x: x.str.contains('bc')).any()` – cs95 Jul 01 '19 at 16:48
  • @cs95 It shoed me an error. `IndexError: boolean index did not match indexed array along dimension 0; dimension is 143 but corresponding boolean dimension is 106` – Jiayu Zhang Jul 01 '19 at 16:52
  • What about `df[df.astype(str).apply(lambda x: x.str.contains('bc')).any(axis=1)]`? – cs95 Jul 01 '19 at 16:58

0 Answers0