Using str.contains on pandas dataframe

Question

This pandas python code generates the error message,

"TypeError: bad operand type for unary ~: 'float'"

I have no idea why because I'm trying to manipulate a str object

df_Anomalous_Vendor_Reasons[~df_Anomalous_Vendor_Reasons['V'].str.contains("File*|registry*")] #sorts, leaving only cases where reason is NOT File or Registry

Anybody got any ideas?

Can you post what happens with `df_Anomalous_Vendor_Reasons['V'].str.contains("File*|registry*")`, also do you need the asterisks here? — EdChum, Jul 31 '15 at 11:55
I can't seem to reproduce. Can you post `df_Anomalous_Vendor_Reasons.to_msgpack()` for us? — Mike Graham, Jul 31 '15 at 12:08
@MikeGraham: are you sure about that? `regex=True` is the default for `str.contains` in 0.16.2, anyway, and it seems to have used a regex compile for years. — DSM, Jul 31 '15 at 12:56
Yeah the asterisks are just there to make it treat File and Registry as substrings — Davtho1983, Jul 31 '15 at 13:05
Ok EdChum - when I used your code I got: raise ValueError('cannot index with vector containing ' ValueError: cannot index with vector containing NA / NaN values — Davtho1983, Jul 31 '15 at 13:07
Ok Mike Graham - when I substitute in your code I get the same error - raise ValueError('cannot index with vector containing ' ValueError: cannot index with vector containing NA / NaN values — Davtho1983, Jul 31 '15 at 13:09
Could it be something as simple as I have Nan values in my df? — Davtho1983, Jul 31 '15 at 13:22
Ok it was that. Sorry, I'm very new - I've been doing this a month and I've launched into pandas with no background even in python - so I need a lot of help — Davtho1983, Jul 31 '15 at 13:42

Josh · Accepted Answer · 2018-08-08T18:26:23.180

Credit to Davtho1983 comment above, I thought I'd add color to the comment for clarity.

For anyone stumbling on this later with the same error (like me). It's a very simple fix. The documentation from pandas shows

Series.str.contains(pat, case=True, flags=0, na=nan, regex=True)

What's happening is the contains() method isn't being applied to na values in the DataFrame, they will remain na. You just need to fill na values with Boolean values so you may use the invert operator ~ .

With the example above one should use

df_Anomalous_Vendor_Reasons[~df_Anomalous_Vendor_Reasons['V'].str.contains("File*|registry*", na=False)]

Of course one should choose False or True for the na argument based on intended logic. Whichever Boolean value you choose for filling na will be inverted.

goooood answer - i was quite confused by this – vagabond Apr 27 '17 at 15:21 — vagabond, Apr 27 '17 at 15:21

Using str.contains on pandas dataframe

1 Answers1

Linked