Pandas : Replace values multiple times until the end

Question

I have a pandas df like this:

import pandas as pd
df = pd.DataFrame({'sales':[114,114,114,113,12,10,8500,8666]})

It is cumulative data from sales. There is a bug in the data : Some observations are wrong. (No value can be greater than the previous one : i.e., 8500 and 8666 are not correct) I want to replace these values with the previous one.

I tried this

df['above'] = df.sales.shift(1)
df.loc[df.above < df.sales, 'sales'] = df.above

But this code works only one time, how to extend this code to replace all values until the end of the series (with the previous one) ?

The desired result should looks like :

jezrael · Accepted Answer · 2017-12-04T09:37:33.577

You can replace values to NaNs by condition and then use ffill - fillna with method='ffill':

df.loc[df.above < df.sales, 'sales'] = np.nan
df['sales'] = df['sales'].ffill()
print (df)
   sales   above
0  114.0     NaN
1  114.0   114.0
2  114.0   114.0
3  113.0   114.0
4   12.0   113.0
5   10.0    12.0
6   10.0    10.0
7   10.0  8500.0

What is same as using mask, which by default replace to NaNs, last cast to int if always integer values:

df = pd.DataFrame({'sales':[114,114,114,113,12,10,8500,8666]})

df['sales'] = df['sales'].mask(df.sales.shift(1) < df.sales).ffill().astype(int)
print (df)
   sales
0    114
1    114
2    114
3    113
4     12
5     10
6     10
7     10

EDIT:

df = pd.DataFrame({'sales':[114,114,114,113,12,10,8500,8400]})

df['sales'] = df['sales'].mask((df.sales.shift(1) < df.sales).cumsum() > 0).ffill().astype(int)
print (df)
   sales
0    114
1    114
2    114
3    113
4     12
5     10
6     10
7     10

Detail:

print ((df.sales.shift(1) < df.sales).cumsum())
0    0
1    0
2    0
3    0
4    0
5    0
6    1
7    1
Name: sales, dtype: int32

print ((df.sales.shift(1) < df.sales).cumsum() > 0)
0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
Name: sales, dtype: bool

Thanks but it doesn't work when a value is lower than the value above : for example with this df = pd.DataFrame({'sales':[114,114,114,113,12,10,8500,8400]}) — SciPy, Dec 04 '17 at 09:26
So I think if `cumulative data` it is not possible. Please give me some time. — jezrael, Dec 04 '17 at 09:30
One question, do you need replace from first `True`s of mask to end of DataFrame? — jezrael, Dec 04 '17 at 09:33
yes ............Because the bug appears only at the end of the series (but the values are weird.) — SciPy, Dec 04 '17 at 09:35

score 0 · Answer 2 · answered Dec 04 '17 at 09:08

0

How about this?

df.loc[df['sales'] > 200] = 10

ref: https://stackoverflow.com/a/38467449/4772042

'200' is the proper value I caught. Change it to the desired value.

answered Dec 04 '17 at 09:08

Cho

105
3
11

Pandas : Replace values multiple times until the end

2 Answers2