1

I'm trying to filter a dataframe based on several conditions. Then, I want to drop that subset from a separate, much larger dataframe.

df = pd.DataFrame({ 'A' : ['UNKNOWN','UNK','TEST','TEST'],
                    'E' : pd.Categorical(["test","train","test","train"]),
                    'F' : 'foo' })

df2 = pd.DataFrame({ 'A' : ['UNKNOWN','UNK','TEST','TEST','UNKOWN','UNKKK'],
                    'E' : pd.Categorical(["test","train","test","train",'train','train']),
                    'D' : np.array([3] * 6,dtype='int32'),
                    'F' : 'foo' })

rgx = r'UNKNOWN|UNK'
df_drop = df.loc[df['A'].str.contains(rgx, na=False, flags=re.IGNORECASE, regex=True, case=False)]
df2 = df2[~df_drop]

I want the following output for df2:

         A  D      E    F
2     TEST  3   test  foo
3     TEST  3  train  foo

Instead I get the following error:

TypeError: bad operand type for unary ~: 'str'

The reason I am not filtering df2 directly is that I want to make df_drop its own separate dataframe in order to retain the records that I have dropped.

I think I'm misunderstanding how the unary is supposed to work. Or I made a syntax error. But I can't find it and none of the previous solutions (for instance, removing NaNs from the dataframe) seem to be applicable here.

Community
  • 1
  • 1
ale19
  • 1,097
  • 4
  • 21
  • 33
  • 1
    Part of my original comment still stands, what you passed was not a boolean mask, it was a df. If you took a mask based `df` it would still have failed because the mask length is different from `df2`'s length – EdChum Dec 20 '16 at 16:27
  • That makes sense. I didn't realize I needed a boolean mask, instead of a dataframe. I see the problem now. Thank you! – ale19 Dec 20 '16 at 16:28

1 Answers1

5

I think you need filter in big dataframe:

rgx = r'UNKNOWN|UNK'
mask = df2['A'].str.contains(rgx, na=False, flags=re.IGNORECASE, regex=True, case=False)
print (mask)
0     True
1     True
2    False
3    False
4     True
5     True
Name: A, dtype: bool

print (df2[~mask])
      A  D      E    G
2  TEST  3   test  foo
3  TEST  3  train  foo
jezrael
  • 629,482
  • 62
  • 918
  • 895