How to replace values in pandas data frame according to a function

Question

I have a pandas data frame it looks like this

 0  1   2   3   4   5   6   7   8   9   ... 253 254 255 256 257 258 259 260 261 262
        0       30  84  126 135 137 179 242 342 426 ... 0   0   0   0   0   0   0   0   0   0
        1       24  53  75  134 158 192 194 211 213 ... 0   0   0   0   0   0   0   0   0   0
        2       51  143 173 257 446 491 504 510 559 ... 0   0   0   0   0   0   0   0   0   0
        3       1   20  22  92  124 149 211 335 387 ... 0   0   0   0   0   0   0   0   0   0
        4       34  51  56  106 110 121 163 233 266 ... 0   0   0   0   0   0   0   0   0   0

I want to divide each number in the data frame by 7 and put the result in the data frame instead of the number, I was testing with a for loop, but it doesn't work for me

for i in x:
    y = i % 7
    if y == 0:
        x.replace(i, 7)

It should work but when I print the data frame I can't see the change, I even tried to replace a specific value, but also no change.

How should I do it and I was wondering what is the best solution memory wise since I'm trying to scale this to a bigger data frame

lets say we have a line like this

 0 8 30 28 36 40 45 0 56

the output I want should be,

 0 1 2 7 1 5 3 0 7

Thanks in advance

jezrael · Accepted Answer · 2017-10-18T13:27:06.133

1

Use numpy.where with chained condition for check 0:

print (df)
   0   1    2    3    4    5    6    7    8    9  253  254  255  256  257
0  0   8   30   28   36   40   45    0   56  426    0    0    0    0    0
1  1  24   53   75  134  158  192  194  211  213    0    0    0    0    0
2  2  51  143  173  257  446  491  504  510  559    0    0    0    0    0
3  3   1   20   22   92  124  149  211  335  387    0    0    0    0    0
4  4  34   51   56  106  110  121  163  233  266    0    0    0    0    0

mdf = df % 7
df = pd.DataFrame(np.where((mdf == 0) & (df != 0), 7, mdf),
                  columns=df.columns, 
                  index=df.index)
print (df)
   0  1  2  3  4  5  6  7  8  9  253  254  255  256  257
0  0  1  2  7  1  5  3  0  7  6    0    0    0    0    0
1  1  3  4  5  1  4  3  5  1  3    0    0    0    0    0
2  2  2  3  5  5  5  1  7  6  6    0    0    0    0    0
3  3  1  6  1  1  5  2  1  6  2    0    0    0    0    0
4  4  6  2  7  1  5  2  2  2  7    0    0    0    0    0

edited Oct 18 '17 at 13:27

answered Oct 18 '17 at 05:11

jezrael

629,482
62
918
895

Ok it's nearly what I'm looking for, but i want to except all the zeros from that meaning to leave the zeros alone work on other values or if I could replace the zeros from the mask with another value say 7, because they represent different labels – Muhammed Eltabakh Oct 18 '17 at 05:43
and what does the the 7 parameter stand for – Muhammed Eltabakh Oct 18 '17 at 05:44
So you want dont replace columns with `0` only? – jezrael Oct 18 '17 at 05:46
No I want to replace the resulting zeros from this mask with another value to distinguish between it and the zeros that existed before because this data represents days of different weeks and the zero means no days, but in that mask the zero will represent sundays and the days that he didn't show up – Muhammed Eltabakh Oct 18 '17 at 05:53
Can you add desired output? – jezrael Oct 18 '17 at 05:55
unfortunately no, I will try to explain more This is the description of my data 0: no visit on the next week 1: Monday 2: Tuesday 3: Wednesday 4: Thursday 5: Friday 6: Saturday 7: Sunday if I get Sunday the output of i % 7 will be zero, also if i = 0 the output will be zero so I want Sundays to have different values meaning I want to replace the zeros from the mask with another value. Hope it is clear now – Muhammed Eltabakh Oct 18 '17 at 06:06
Sorry, I am lost. Why is not possible add desired output from your sample data? Maybe it help. – jezrael Oct 18 '17 at 07:29
Or do you want set each 7th column to `7` ? – jezrael Oct 18 '17 at 08:10
lets say we have a line like this 0 8 30 28 36 40 45 0 56 the output I want should be, 0 1 2 7 1 5 3 0 7 – Muhammed Eltabakh Oct 18 '17 at 12:54
And logic? why `5` ? why `3` ? – jezrael Oct 18 '17 at 12:55
it's the same x % 7 except for values 56 or 28 when the output is zero I want it to be 7 – Muhammed Eltabakh Oct 18 '17 at 12:57
and zero values should remain zeros – Muhammed Eltabakh Oct 18 '17 at 13:00
I get it, please give me a sec – jezrael Oct 18 '17 at 13:02
I edit answer, also first row was created by your new data, please check it. – jezrael Oct 18 '17 at 13:27
Thanks a lot it is exactly what I'm looking for. I want to try something different with the same data and I was wondering if you could take a look at that question and help if you could https://stackoverflow.com/questions/46822149/how-can-i-sessionize-rows-in-pandas-based-on-week-days – Muhammed Eltabakh Oct 19 '17 at 02:52

score 0 · Answer 2 · answered Oct 20 '17 at 19:29

The DataFrame method df.apply() will apply a function to each cell. The first parameter of the function will be the cell's contents.

import pandas as pd

# Just an example df
df = pd.DataFrame(data={"Column1":[7*x for x in range(1,11)], "Column2":[7*x for x in range(11,21)]})

print(df)

   Column1  Column2
0        7       77
1       14       84
2       21       91
3       28       98
4       35      105
5       42      112
6       49      119
7       56      126
8       63      133
9       70      140

Below is a simple function and applying it.

Note that you'll need to store the results in a new variable (it'll display the results but it doesn't change the original DataFrame).

The function assumes Python 3. If using Python 2 division works differently.

def divide_by_7(x):

    return x / 7

df2 = df.apply(divide_by_7)

print(df2)

   Column1  Column2
0      1.0     11.0
1      2.0     12.0
2      3.0     13.0
3      4.0     14.0
4      5.0     15.0
5      6.0     16.0
6      7.0     17.0
7      8.0     18.0
8      9.0     19.0
9     10.0     20.0

Using a function with a parameter of more than just the cell contents requires using the "args" parameter in the apply() method.

# A more flexible division function
def divide_by_n(x, n):

    return x / n

#If passing in arguments, pass them as a tuple to args parameter
df3 = df.apply(divide_by_n, args=(7,))

print(df3)

   Column1  Column2
0      1.0     11.0
1      2.0     12.0
2      3.0     13.0
3      4.0     14.0
4      5.0     15.0
5      6.0     16.0
6      7.0     17.0
7      8.0     18.0
8      9.0     19.0
9     10.0     20.0

There are other details of using apply(), for example to apply to create a new column. There are examples in the pandas documentation.

Ok it's nearly the answer I'm looking for. could you help me with this question using pandas also https://stackoverflow.com/questions/46841519/how-to-sessionize-data-in-pandas-rows — Muhammed Eltabakh, Oct 21 '17 at 00:13

How to replace values in pandas data frame according to a function

2 Answers2