I have a dataframe which is a series of rows containing identifiers of individuals, dates of their appointments and a numerical value.
For every row, I need the sum of the numerical values for that individual over the past year. I effectively want to do:
df.groupby('ID')['numerical value'].cumsum()
but I want the cumsum
to include only rows where the date is up to a year in the past.
I'm currently using the following apply function but it is very slow when I have a long dataframe.
def sum_hist(appt_date,personID,key):
search_limit=appt_date+pd.Timedelta("-365 days")
temp_df=df[(df["ID number"]==personID)
& (df["Appointment Date"]>search_limit) & (df["Appointment Date"]<=appt_date)]
return(sum(temp_df[key].values))
df[key]=df[["Appointment Date","ID number"]].apply(lambda x: sum_hist(*x,key),axis=1)
I'd appreciate any method to do that same thing that is more efficient.