3

I've got a pandas dataframe organized by date I'm trying to split up by year (in a column called 'year'). I want to return one dataframe per year, with a name something like "df19XX".

I was hoping to write a "For" loop that can handle this... something like...

for d in [1980, 1981, 1982]:
    df(d) = df[df['year']==d]

... which would return three data frames called df1980, df1981 and df1982.

thanks!

steadynappin
  • 153
  • 1
  • 10

2 Answers2

2

You can iterate through the groupby:

In [11]: df = pd.DataFrame({"date": pd.date_range("2012-12-28", "2013-01-03"), "A": np.random.rand(7)})

In [12]: df
Out[12]:
          A       date
0  0.434715 2012-12-28
1  0.208877 2012-12-29
2  0.912897 2012-12-30
3  0.226368 2012-12-31
4  0.100489 2013-01-01
5  0.474088 2013-01-02
6  0.348368 2013-01-03

In [13]: g = df.groupby(df.date.dt.year)

In [14]: for k, v in g:
    ...:     print(k)
    ...:     print(v)
    ...:     print()
    ...:
2012
          A       date
0  0.434715 2012-12-28
1  0.208877 2012-12-29
2  0.912897 2012-12-30
3  0.226368 2012-12-31

2013
          A       date
4  0.100489 2013-01-01
5  0.474088 2013-01-02
6  0.348368 2013-01-03

I would strongly argue that is preferable to simply have a dict having variables and messing around with the locals() dictionary (I claim using locals() so is not "pythonic"):

In [14]: {k: grp for k, grp in g}
Out[14]:
{2012:           A       date
 0  0.434715 2012-12-28
 1  0.208877 2012-12-29
 2  0.912897 2012-12-30
 3  0.226368 2012-12-31, 2013:           A       date
 4  0.100489 2013-01-01
 5  0.474088 2013-01-02
 6  0.348368 2013-01-03}

Though you might consider calculating this on the fly (rather than storing in a dict or indeed a variable). You can use get_group:

In [15]: g.get_group(2012)
Out[15]:
          A       date
0  0.865239 2012-12-28
1  0.019071 2012-12-29
2  0.362088 2012-12-30
3  0.031861 2012-12-31
Andy Hayden
  • 291,328
  • 80
  • 565
  • 500
2

Something like this ? Also using @Andy's df

variables = locals()
for i in [2012, 2013]:
    variables["df{0}".format(i)]=df.loc[df.date.dt.year==i]
df2012
Out[118]: 
          A       date
0  0.881468 2012-12-28
1  0.237672 2012-12-29
2  0.992287 2012-12-30
3  0.194288 2012-12-31
df2013
Out[119]: 
          A       date
4  0.151854 2013-01-01
5  0.855312 2013-01-02
6  0.534075 2013-01-03
BENY
  • 258,262
  • 17
  • 121
  • 165
  • Ha, @piRSquared had that answer and deleted it (soon you will be able to see deleted answers :p), my comment was messing with the `locals()` dict is considered bad form/not pythonic, so whilst this does answer the question, I would argue that you don't actually want these as variables - a dict would be better. – Andy Hayden Nov 10 '17 at 04:46
  • @AndyHayden got you ~ :-) thank for explanation, will using your way next time ... – BENY Nov 10 '17 at 04:51
  • 1
    I would like to have a reference, this goes some way towards: https://stackoverflow.com/a/1551187/1240268 and I guess this too https://stackoverflow.com/a/5036775/1240268 – Andy Hayden Nov 10 '17 at 04:55
  • 1
    "should I forbidden using the locals in the future" yes, I think so. :p – Andy Hayden Nov 10 '17 at 04:57