3

So my df is per minute readings from sensor. I want to find slopes for every hour readings, meaning every 1 hour should have 1 slope value. How to I do that?

I've tried groupby.apply linregress , not working, also tried to groupby 60 rows, but that just gives mean values for the hour. Column 1 is T_a and Column 2 is Q_a. I want slope values for T_am vs Q_a for every hour.

df.head()
2019-01-09 17:03:00   3.09125   93.353877
2019-01-09 17:04:00   3.08575   89.513643
2019-01-09 17:05:00   3.10325   92.700350
2019-01-09 17:06:00   3.08075   91.089470
2019-01-09 17:07:00   3.08200   92.563898

df = df.groupby(index).apply(lambda df: linregress(df.T_am, df.Q_a)[0])

KeyError: Timestamp('2019-01-09 18:02:00')
Zee
  • 169
  • 9

1 Answers1

2

You need to groupby on your index hours, if you have only one day of measurements this should work:

df.groupby(index.hour).apply(lambda df: linregress(df.T_am, df.Q_a)[0])

If you instead have multiple days I would first create a column containing your index rounded to the hour and then group on that index:

df['index_hour']=df.index.floor('h')
df.groupby('index_hour').apply(lambda df: linregress(df.T_am, df.Q_a)[0])

If you want to keep several parameters from the linear regression you can use the following script:

def extract_lr(x):
    lr = linregress(x['T_am'], x['Q_a'])
    return pd.Series([lr.slope, lr.intercept, lr.rvalue, lr.pvalue],
                     index=['slope', 'intercept', 'rvalue','pvalue'])

df.groupby('index_hour').apply(lambda x: extract_lr(x))

credits to this answer.

baccandr
  • 842
  • 4
  • 15
  • Thank you so much. It really helped. I was stuck at this since morning. Further, I'm trying to concatenate this new dataframe which has slope values with another dataframe which has per hour readings. But it's not sync since some datapoints were filtered out while calculating slope. len(df)=650 len(df2)=1464 **df2 has index.hour** The new df should have slope value for that specific index.hour. – Zee Oct 24 '19 at 14:25
  • I'm trying to create new index 'new_index = df.reindex.union(df2.index)' AttributeError: 'function' object has no attribute 'union' What is the efficient to sync and do what I'm trying to do? – Zee Oct 24 '19 at 14:35
  • I would probably use `pd.join`, have a look at the documentation and at this very nicely written QnA: https://stackoverflow.com/questions/53645882/pandas-merging-101 – baccandr Oct 24 '19 at 14:38
  • Thanks. 'df.groupby('index_hour').apply(lambda df: linregress(df.T_am, df.Q_a)[0])' is giving me slope which is the first value, I'm want all values (intercept, rsquared)in columns. I've tried 'df.groupby('index_hour').apply(lambda df: linregress(df.T_am, df.Q_a)).rename(columns:{0:'Slope', 1:'Intercept', 2:'R_value', 3:'P_value', 4:'Std_der' })' But it's still in series object format. df[0] LinregressResult(slope=36,intercept=-97.5...) Is there a way to access these values? Preferable make them in seperate columns in same df. Thanks a lot. – Zee Oct 25 '19 at 09:17
  • By the way, try to edit your question with this additional request, so it's more clear. I already included this into my answer.... – baccandr Oct 25 '19 at 09:29
  • Tried that. Not working. Is there any other way I can access dtype Lineregress Results of each row? – Zee Oct 25 '19 at 09:35
  • Thanks. It's much better than what i was doing making a new df. df_neww = pd.DataFrame.from_records(df, columns = ['Slope','Intercept','R_value','p_value','Std_error']) df_neww = df_neww.set_index(df.index) Thanks a lot – Zee Oct 25 '19 at 12:31