How to efficiently do a sum over the previous year for many rows in a dataframe grouping by an ID column

Question

I have a dataframe which is a series of rows containing identifiers of individuals, dates of their appointments and a numerical value.

For every row, I need the sum of the numerical values for that individual over the past year. I effectively want to do:

df.groupby('ID')['numerical value'].cumsum()

but I want the cumsum to include only rows where the date is up to a year in the past.

I'm currently using the following apply function but it is very slow when I have a long dataframe.

def sum_hist(appt_date,personID,key):
    search_limit=appt_date+pd.Timedelta("-365 days")
    temp_df=df[(df["ID number"]==personID)
    & (df["Appointment Date"]>search_limit) & (df["Appointment Date"]<=appt_date)]
    return(sum(temp_df[key].values))

df[key]=df[["Appointment Date","ID number"]].apply(lambda x: sum_hist(*x,key),axis=1)

I'd appreciate any method to do that same thing that is more efficient.

Check this out https://stackoverflow.com/questions/49388065/1-year-rolling-mean-pandas-on-column-date — moys, Sep 09 '19 at 08:01
Thanks @SH-SF, that more or less solves it for me. I will edit my question as soon as I can get my grouped results back into dataframe. — Jon Holdship, Sep 09 '19 at 08:23
Consider creating an answer (and not putting your answer in the question). This way visitors see if there is an answer in list overviews on stackoverflow. — ilmiacs, Sep 09 '19 at 09:31

score 0 · Accepted Answer · answered Sep 09 '19 at 09:34

It turns out this can be achieved with the pandas rolling function:

df.groupby('ID number').rolling(window='365d',min_periods=1,on="Appointment Date")['numerical value'].sum()

It greatly improves the speed and makes my problem possible to run in less than a minute rather than the hours it was taking previously.

Thanks @SH-SF for the link to 1 Year Rolling mean pandas on column date

How to efficiently do a sum over the previous year for many rows in a dataframe grouping by an ID column

1 Answers1