8

I have a dictionary dict contains many (more than 100) dataframes. Each dataframe contains two variable name and 'value_i'. For example, the first dataframe in this dictionary dict[1] looks like the following:

name  value_1
A      1
B      1.1 
C      2

Similarly, the second dataframe in this dictionary dict2 looks like the following:

name  value_2
A      1
B      1.1 
D      1.3

I want to merge all dataframes within this dictionary by common variable name.

The expected outcome should looks like the following:

name  value_1   value_2
A      1         1
B      1.1       1.1
C      2         nan
D     nan           1.3

I know I can do pd.merge[dict[i], dict[i-1], how = 'outer', on = 'name' many times to merge all dataframes together. But this is too inefficient.

I tried pd.concat(dict.values(), axis = 1, join='outer' But concat dont allow me to merge by key variable.

Can anyone teach me how to do it more efficiently please?

fly36
  • 483
  • 1
  • 5
  • 22
  • What is the expected output for your example? – Dani Mesejo Dec 26 '18 at 18:43
  • You should take a look at [this post](https://stackoverflow.com/questions/53645882/pandas-merging-101) as it covers a tremendous amount of merging info. – d_kennetz Dec 26 '18 at 18:43
  • I am not sure about all the answers. Looks like you want `pd.join(dict.keys(),how = 'outer', on = 'name')`, which is covered by the duplicate target. – kabanus Dec 26 '18 at 18:50
  • 2
    For pd.join(dict.keys(),how = 'outer', on = 'name', I got the following error message module 'pandas' has no attribute 'join' – fly36 Dec 26 '18 at 19:10

2 Answers2

5

Given

>>> df1                                                                                                                
  name  value_1
0    A      1.0
1    B      1.1
2    C      2.0
>>> df2                                                                                                                
  name  value_2
0    A      1.0
1    B      1.1
2    D      1.3

and

>>> d = {1: df1, 2: df2} 

you can issue

>>> from functools import partial, reduce                                                                              
>>>                                                                                                                    
>>> my_reduce = partial(pd.merge, on='name', how='outer')                                                              
>>> reduce(my_reduce, d.values())                                                                                      
  name  value_1  value_2
0    A      1.0      1.0
1    B      1.1      1.1
2    C      2.0      NaN
3    D      NaN      1.3
timgeb
  • 64,821
  • 18
  • 95
  • 124
2

Consider you have multiple items in dict

d={i : j.set_index('name') for i,j in d.items()}
yourdf=pd.concat(d,axis=1)
BENY
  • 258,262
  • 17
  • 121
  • 165
  • Works if the name column in OP's dfs have all unique values. If that's the case I'm wondering why they ask about a merging, though. – timgeb Dec 26 '18 at 18:55
  • In this case, the variable name changed to (i, 'value_i'). Is there anyway to keep the original variable name. – fly36 Dec 26 '18 at 19:11
  • @timgeb since the merge is well explained in the link 101 , so that I am trying some thing different – BENY Dec 26 '18 at 19:20
  • @fly36 check https://stackoverflow.com/questions/22233488/pandas-drop-a-level-from-a-multi-level-column-index – BENY Dec 26 '18 at 19:22