49

this is a rather similar question to this question but with one key difference: I'm selecting the data I want to change not by its index but by some criteria.

If the criteria I apply return a single row, I'd expect to be able to set the value of a certain column in that row in an easy way, but my first attempt doesn't work:

>>> d = pd.DataFrame({'year':[2008,2008,2008,2008,2009,2009,2009,2009], 
...                   'flavour':['strawberry','strawberry','banana','banana',
...                   'strawberry','strawberry','banana','banana'],
...                   'day':['sat','sun','sat','sun','sat','sun','sat','sun'],
...                   'sales':[10,12,22,23,11,13,23,24]})

>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

>>> d[d.sales==24]
   day flavour  sales  year
7  sun  banana     24  2009

>>> d[d.sales==24].sales = 100
>>> d
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana     24  2009

So rather than setting 2009 Sunday's Banana sales to 100, nothing happens! What's the nicest way to do this? Ideally the solution should use the row number, as you normally don't know that in advance!

feetwet
  • 2,702
  • 6
  • 36
  • 73
LondonRob
  • 53,478
  • 30
  • 110
  • 152

3 Answers3

71

Many ways to do that

1

In [7]: d.sales[d.sales==24] = 100

In [8]: d
Out[8]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     12  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

2

In [26]: d.loc[d.sales == 12, 'sales'] = 99

In [27]: d
Out[27]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     23  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     23  2009
7  sun      banana    100  2009

3

In [28]: d.sales = d.sales.replace(23, 24)

In [29]: d
Out[29]: 
   day     flavour  sales  year
0  sat  strawberry     10  2008
1  sun  strawberry     99  2008
2  sat      banana     22  2008
3  sun      banana     24  2008
4  sat  strawberry     11  2009
5  sun  strawberry     13  2009
6  sat      banana     24  2009
7  sun      banana    100  2009
waitingkuo
  • 69,398
  • 23
  • 102
  • 115
  • 2
    Yes! Solution 1. worked. Sort of counter-intuitive that this works: `d.sales[d.sales==24] = 100` but this doesn't: `d[d.sales==24].sales=100`. They look (functionally) to be the same to me. Ah well. Thanks @waitingkuo. – LondonRob Jul 18 '13 at 17:47
  • 1
    d[d.sales==24] generate a new object. – waitingkuo Jul 19 '13 at 02:15
  • 1
    re. @waitingkuo's comment: is that expected behaviour, pandas guys? Certainly not intuitive that `d[d.sales==24]` should generate a copy of the original `DataFrame`. In fact, I'd say that every object should be a reference to the original (including selecting a single row which, correctly, 'collapses' to a pandas `Series`) unless explicitly requested by the user (via some kind of `copy=True`). Thoughts? – LondonRob Jul 19 '13 at 16:02
  • 2
    FYI: these now will raise/warn in 0.13 see (here)[http://pandas.pydata.org/pandas-docs/dev/indexing.html#indexing-view-versus-copy) – Jeff Jan 08 '14 at 14:31
  • @Jeff How to handle these warning, or is there any other right way to do this without warning? – Adarsh Maurya Mar 27 '19 at 14:47
  • is there a way to not use `d.sales`? i prefer to not use this syntax as sometimes column names have space in it – cryanbhu Oct 09 '19 at 14:15
  • Beware, .loc will turn all columns in objects. – Harvey Jan 20 '20 at 10:11
12

Not sure about older version of pandas, but in 0.16 the value of a particular cell can be set based on multiple column values.

Extending the answer provided by @waitingkuo, the same operation can also be done based on values of multiple columns.

d.loc[(d.day== 'sun') & (d.flavour== 'banana') & (d.year== 2009),'sales'] = 100
ram
  • 457
  • 1
  • 5
  • 9
5

Old question, but I'm surprised nobody mentioned numpy's .where() functionality (which can be called directly from the pandas module).

In this case the code would be:

d.sales = pd.np.where(d.sales == 24, 100, d.sales)

To my knowledge, this is one of the fastest ways to conditionally change data across a series.

elPastor
  • 6,274
  • 8
  • 41
  • 68