34

There has been many similar questions but none specifically to this.

I have a list of data frames and I need to merge them together using a unique column (date). Field names are different so concat is out.

I can manually use df[0].merge(df[1],on='Date').merge(df[3],on='Date) etc. to merge each df one by one, but the issue is that the number of data frames in the list differs with user input.

Is there any way to merge that just combines all data frames in a list at one go? Or perhaps some for in loop at does that?

I am using Python 2.7.

Jake
  • 1,259
  • 2
  • 15
  • 27

1 Answers1

96

You can use reduce function where dfList is your list of data frames:

import pandas as pd
from functools import reduce
reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)

As a demo:

df = pd.DataFrame({'Date': [1,2,3,4], 'Value': [2,3,3,4]})
dfList = [df, df, df]
dfList

# [   Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4,    Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4,    Date  Value
#  0     1      2
#  1     2      3
#  2     3      3
#  3     4      4]

reduce(lambda x, y: pd.merge(x, y, on = 'Date'), dfList)
#   Date  Value_x  Value_y  Value
# 0    1        2        2      2
# 1    2        3        3      3
# 2    3        3        3      3
# 3    4        4        4      4
Gustavo
  • 339
  • 4
  • 14
Psidom
  • 171,477
  • 20
  • 249
  • 286
  • 3
    wow... thanks! I never really understand what's lambda. Looks like its time to read up on it! – Jake Jun 29 '16 at 02:08
  • 1
    Just to mention it. map() or listcomprehension would also work. For speed i would first convert everything to dict and than create only one data frame from it. – PlagTag Jun 30 '16 at 06:43
  • Thanks for this. When I run it I go from 167428 rows x 85 columns to 0 rows × 29233 columns - any ideas why that might be? – Maverick Jul 21 '17 at 16:15
  • 2
    @Maverick `pd.merge` does inner join by default, so if there is no overlap on the Date column across all data frames in the list, you will have zero rows. – Psidom Jul 21 '17 at 16:20
  • 42
    Import reduce with: "from functools import reduce". If you're using python 3+ – Matheus Araujo Jun 29 '18 at 10:27
  • me too @Jake me too hehe – Marcelo Gazzola Nov 12 '20 at 19:16