Pandas groupby & linregress how to extract

Question

I am doing linear regression on a dataframe by group to generate summary statistics. I have calculated the regression of two variables, km vs price, using scipy linregress:

import pandas as pd
from scipy.stats import linregress    
df = pd.read_csv('test dataset faceted small.csv')
grouped = df.groupby(['year','make','engine','drive','transmission','badge'])
test = grouped.apply(lambda x: linregress(x['km'], x['price']))
print test
test.to_csv('grouped.csv', index=False)

print test gives me:

year  make    engine  drive  transmission  badge                
1994  subaru  1.6L    awd    auto          wrx                      (-0.0019029525668, 2217.67284738, -0.190381626...
1997  mazda   1.3L    2wd    manual        121 metro                (-0.00724142957301, 4213.71579612, -0.30608491...
1999  nissan  1.6L    2wd    auto          pulsar plus lx n15 s2    (-0.00245336355614, 3653.42015515, -0.17060101...

And test saved to csv is:

LinregressResult(slope=-0.0019029525667976811, intercept=2217.6728473825792, rvalue=-0.19038162624636565, pvalue=4.2750387135904842e-07, stderr=0.00037275167083276965)
LinregressResult(slope=-0.0072414295730094738, intercept=4213.7157961188113, rvalue=-0.30608491681348643, pvalue=4.8781453623746113e-17, stderr=0.00084171437048465665)
LinregressResult(slope=-0.0024533635561369252, intercept=3653.4201551461483, rvalue=-0.17060101350197393, pvalue=1.4676330869804576e-07, stderr=0.0004631573671617427)

However my desired csv output is:

year  make    engine  drive  transmission  badge                   slope             intercept       rvalue      
1994  subaru  1.6L    awd    auto          wrx                     -0.0019029525668  2217.67284738 -0.190381626...
1997  mazda   1.3L    2wd    manual        121 metro               -0.00724142957301 4213.71579612 -0.30608491...
1999  nissan  1.6L    2wd    auto          pulsar plus lx n15 s2   -0.00245336355614 3653.42015515 -0.17060101...

So that I can call the results easily later on. How can I append the LinregressResult to each group and save them to csv?

score 4 · Answer 1 · answered May 07 '16 at 08:39

i guess you can simply do this:

test = (grouped.apply(lambda x: pd.Series(linregress(x['km'], x['price'])))
               .rename(columns={
                        0: 'slope',
                        1: 'intercept',
                        2: 'rvalue',
                        3: 'pvalue',
                        4: 'stderr'
                      })
       )

instead of

test = grouped.apply(lambda x: linregress(x['km'], x['price']))

Demonstration:

rows = 10

# generate random integer numbers
df = pd.DataFrame(np.random.randint(0, 10, size=(rows, 5)), columns=list('abcde'))

def linregress(x):
    # imitates `linregress`
    # returns tuples 
    return tuple(x)

test = (df.apply(lambda x: pd.Series(linregress(x)), axis=1)
          .rename(columns={
                   0: 'slope',
                   1: 'intercept',
                   2: 'rvalue',
                   3: 'pvalue',
                   4: 'stderr'
                 })
       )

Output:

In [48]: df.apply(lambda x: linregress(x), axis=1)
Out[48]:
0    (7, 7, 2, 0, 0)
1    (6, 9, 3, 1, 5)
2    (5, 1, 6, 1, 3)
3    (4, 4, 2, 1, 4)
4    (8, 7, 1, 5, 4)
5    (0, 2, 7, 6, 1)
6    (3, 8, 4, 2, 8)
7    (6, 0, 0, 3, 2)
8    (9, 4, 6, 2, 3)
9    (8, 1, 7, 9, 8)
dtype: object


In [50]: test = (df.apply(lambda x: pd.Series(linregress(x)), axis=1)
   ....:           .rename(columns={
   ....:                    0: 'slope',
   ....:                    1: 'intercept',
   ....:                    2: 'rvalue',
   ....:                    3: 'pvalue',
   ....:                    4: 'stderr'
   ....:                  })
   ....:        )

In [51]: test
Out[51]:
   slope  intercept  rvalue  pvalue  stderr
0      7          7       2       0       0
1      6          9       3       1       5
2      5          1       6       1       3
3      4          4       2       1       4
4      8          7       1       5       4
5      0          2       7       6       1
6      3          8       4       2       8
7      6          0       0       3       2
8      9          4       6       2       3
9      8          1       7       9       8

score 2 · Answer 2 · answered May 07 '16 at 05:56

Solution

Use this function in the apply.

def extract_lr(x):
    lr = linregress(x['km'], x['price'])
    return pd.Series([lr.slope, lr.intercept, lr.rvalue],
                     index=['slope', 'intercept', 'rvalue'])

test = grouped.apply(lambda x: linregress(x['km'], x['price']))

Pandas groupby & linregress how to extract

2 Answers2

Solution

Linked