I am doing linear regression on a dataframe by group to generate summary statistics. I have calculated the regression of two variables, km vs price, using scipy linregress:
import pandas as pd
from scipy.stats import linregress
df = pd.read_csv('test dataset faceted small.csv')
grouped = df.groupby(['year','make','engine','drive','transmission','badge'])
test = grouped.apply(lambda x: linregress(x['km'], x['price']))
print test
test.to_csv('grouped.csv', index=False)
print test gives me:
year make engine drive transmission badge
1994 subaru 1.6L awd auto wrx (-0.0019029525668, 2217.67284738, -0.190381626...
1997 mazda 1.3L 2wd manual 121 metro (-0.00724142957301, 4213.71579612, -0.30608491...
1999 nissan 1.6L 2wd auto pulsar plus lx n15 s2 (-0.00245336355614, 3653.42015515, -0.17060101...
And test saved to csv is:
LinregressResult(slope=-0.0019029525667976811, intercept=2217.6728473825792, rvalue=-0.19038162624636565, pvalue=4.2750387135904842e-07, stderr=0.00037275167083276965)
LinregressResult(slope=-0.0072414295730094738, intercept=4213.7157961188113, rvalue=-0.30608491681348643, pvalue=4.8781453623746113e-17, stderr=0.00084171437048465665)
LinregressResult(slope=-0.0024533635561369252, intercept=3653.4201551461483, rvalue=-0.17060101350197393, pvalue=1.4676330869804576e-07, stderr=0.0004631573671617427)
However my desired csv output is:
year make engine drive transmission badge slope intercept rvalue
1994 subaru 1.6L awd auto wrx -0.0019029525668 2217.67284738 -0.190381626...
1997 mazda 1.3L 2wd manual 121 metro -0.00724142957301 4213.71579612 -0.30608491...
1999 nissan 1.6L 2wd auto pulsar plus lx n15 s2 -0.00245336355614 3653.42015515 -0.17060101...
So that I can call the results easily later on. How can I append the LinregressResult to each group and save them to csv?