Filter dataframe based on difference been two series, one mapped via dictionary

Question

I have my dictionary

d = {'A':1, 'B':2, 'C':3}

and my dataframe

df =pd.DataFrame({
"col1": ["A", "B", "C"],
"col2": [1, 2, 3],
"col3": [2, 1, 4] })

I search to compare each value in df with the correspondant value in the dictionary. If it matches the value is kept, otherwise the value is drop.

I try

m = df['col2'] >= d[df['col1']]
df.where(m, df, other = "")

But it get this error code for m: TypeError: 'Series' objects are mutable, thus they cannot be hashed...

Thank you for your help.

TypeError: 'Series' objects are mutable, thus they cannot be hashed — Billy, Nov 13 '18 at 15:44
Possible duplicate of ["Series objects are mutable and cannot be hashed" error](https://stackoverflow.com/questions/29700552/series-objects-are-mutable-and-cannot-be-hashed-error) — narendra-choudhary, Nov 13 '18 at 16:24

John R · Accepted Answer · 2018-11-13T16:24:26.867

1

Create a new column for comparison using apply

df[‘dict_col’] = df[‘col1’].apply(lambda k: d[k])

m = df[‘dict_col’] >= df[‘col2’]

df[‘col2’] = df[‘col2’].where(m, df, other = "")

edited Nov 13 '18 at 16:24

answered Nov 13 '18 at 15:47

John R

thanks ! But the whole row is erased, not just the value. – Billy Nov 13 '18 at 16:07
Updated to replace those values with ‘’ – John R Nov 13 '18 at 16:26

score 1 · Answer 2 · answered Nov 13 '18 at 16:00

1

You can use pd.Series.map with loc and Boolean indexing:

df = df.loc[df['col2'] >= df['col1'].map(d)]

answered Nov 13 '18 at 16:00

jpp

But the whole row is erased, not just the value. – Billy Nov 13 '18 at 16:07
@Billy, Nope, that means either the data is wrong or you are applying the logic incorrectly. – jpp Nov 13 '18 at 16:10
i do not understand – Billy Nov 13 '18 at 16:39

score 1 · Answer 3 · answered Nov 13 '18 at 16:19

1

Hint is there in error message itself.

TypeError: 'Series' objects are mutable, thus they cannot be hashed.

df['col1'] is a Series object, which is a mutable object.

Mutable objects cannot be hashed and hence cannot be used as a dictionary key. From docs:

... dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys...

You are using Series object as dictionary key. One way to rewrite d[df['col1']] is:

[d[x] for x in df['col1']]

answered Nov 13 '18 at 16:19

thanks, it's clear, I have understood my mistake (and reread the documentation). But the result of my condition is still applying to the whole row. I do not get why. – Billy Nov 13 '18 at 16:28

3 Answers3