How to merge all data-frames in a dictionary in Python

Question

I have a dictionary dict contains many (more than 100) dataframes. Each dataframe contains two variable name and 'value_i'. For example, the first dataframe in this dictionary dict[1] looks like the following:

name  value_1
A      1
B      1.1 
C      2

Similarly, the second dataframe in this dictionary dict2 looks like the following:

name  value_2
A      1
B      1.1 
D      1.3

I want to merge all dataframes within this dictionary by common variable name.

The expected outcome should looks like the following:

name  value_1   value_2
A      1         1
B      1.1       1.1
C      2         nan
D     nan           1.3

I know I can do pd.merge[dict[i], dict[i-1], how = 'outer', on = 'name' many times to merge all dataframes together. But this is too inefficient.

I tried pd.concat(dict.values(), axis = 1, join='outer' But concat dont allow me to merge by key variable.

Can anyone teach me how to do it more efficiently please?

You should take a look at [this post](https://stackoverflow.com/questions/53645882/pandas-merging-101) as it covers a tremendous amount of merging info. — d_kennetz, Dec 26 '18 at 18:43
I am not sure about all the answers. Looks like you want `pd.join(dict.keys(),how = 'outer', on = 'name')`, which is covered by the duplicate target. — kabanus, Dec 26 '18 at 18:50
For pd.join(dict.keys(),how = 'outer', on = 'name', I got the following error message module 'pandas' has no attribute 'join' — fly36, Dec 26 '18 at 19:10

score 5 · Accepted Answer · answered Dec 26 '18 at 18:48

Given

>>> df1                                                                                                                
  name  value_1
0    A      1.0
1    B      1.1
2    C      2.0
>>> df2                                                                                                                
  name  value_2
0    A      1.0
1    B      1.1
2    D      1.3

and

>>> d = {1: df1, 2: df2}

you can issue

>>> from functools import partial, reduce                                                                              
>>>                                                                                                                    
>>> my_reduce = partial(pd.merge, on='name', how='outer')                                                              
>>> reduce(my_reduce, d.values())                                                                                      
  name  value_1  value_2
0    A      1.0      1.0
1    B      1.1      1.1
2    C      2.0      NaN
3    D      NaN      1.3

score 2 · Answer 2 · answered Dec 26 '18 at 18:48

2

Consider you have multiple items in dict

d={i : j.set_index('name') for i,j in d.items()}
yourdf=pd.concat(d,axis=1)

answered Dec 26 '18 at 18:48

BENY

258,262
17
121
165

Works if the name column in OP's dfs have all unique values. If that's the case I'm wondering why they ask about a merging, though. – timgeb Dec 26 '18 at 18:55
In this case, the variable name changed to (i, 'value_i'). Is there anyway to keep the original variable name. – fly36 Dec 26 '18 at 19:11
@timgeb since the merge is well explained in the link 101 , so that I am trying some thing different – BENY Dec 26 '18 at 19:20
@fly36 check https://stackoverflow.com/questions/22233488/pandas-drop-a-level-from-a-multi-level-column-index – BENY Dec 26 '18 at 19:22

How to merge all data-frames in a dictionary in Python

2 Answers2