1

This is an example of the data I have in my dataframe:

test = pd.DataFrame({
    'month': [1,2,3,4,5,6,7,8,9],
    'new': [23,45,67,89,12,34,56,90,12],
    'drop': [2,4,7,9,1,4,6,9,1],
})

month   new drop
0   1   23  2
1   2   45  4
2   3   67  7
3   4   89  9
4   5   12  1
5   6   34  4
6   7   56  6
7   8   90  9
8   9   12  1

I need to calculate the monthly churn rate. I need to sum 2 rows in the new column and then divide the value in drop by this sum (in %).

    month 1: 2*100/23
    month 2: 4*100/(23+45-2)
    month 3: 7*100/(23+45+67-2-4)

    etc.

Could anyone, please, suggest an elegant way of doing this?

aviss
  • 1,527
  • 4
  • 21
  • 40

2 Answers2

5

You need:

test['drop'].mul(100).div((test['new'].cumsum() - test['drop'].cumsum().shift()).fillna(test['new']))

Output:

0    8.695652
1    6.060606
2    5.426357
3    4.265403
4    0.467290
5    1.619433
6    2.006689
7    2.349869
8    0.259067
dtype: float64

Explanation:

(test['new'].cumsum() - test['drop'].cumsum().shift()).fillna(test['new'])

Provides the cumsum of new with subtraction with previous drop cumsum.

Output (comments added for explanation):

0     23.0 # 23
1     66.0 # 23+45-2
2    129.0 # 23+45+67-2-4
3    211.0
4    214.0
5    247.0
6    299.0
7    383.0
8    386.0
harvpan
  • 7,966
  • 2
  • 16
  • 33
  • I think we need .cumsum() with shift otherwise in substruct drop numbers only from the previous month... – aviss Jul 25 '19 at 17:55
  • @aviss, I do not understand. Does't this give you the output you wanted? – harvpan Jul 25 '19 at 17:57
  • I updated my question with another line for month 3. We need to substruct off previous values from drop. – aviss Jul 25 '19 at 18:07
1
test = pd.DataFrame(
    {
        'month': [1, 2, 3, 4, 5, 6, 7, 8, 9],
        'new': [23, 45, 67, 89, 12, 34, 56, 90, 12],
        'drop': [2, 4, 7, 9, 1, 4, 6, 9, 1],
    }
)
df2 = test.assign(
    shifted_drop=lambda x: x['drop'].cumsum().shift(1).fillna(0.0),
    shifted_new=lambda x: x['new'].shift(1).fillna(0.0),
    churn=lambda x: x['drop'] * 100 / (x['new'] + x['shifted_new'] - x['shifted_drop'])
)[['month', 'churn']]

The result

   month     churn
0      1  8.695652
1      2  6.060606
2      3  5.426357
3      4  4.265403
4      5  0.467290
5      6  1.619433
6      7  2.006689
7      8  2.349869
8      9  0.259067

I check the result for the two first rows.

ndclt
  • 1,727
  • 2
  • 7
  • 22