0

I want to merge two DataFrames df1 and df2 such that I receive df_of_my_dreams, with:

df1:

    ID value
0   1    1a
1   2    2a
2   3    3a
3   6    6a 

df2:

    ID value
0   2    2a
1   4    4a
2   6    6a 

df_of_my_dreams:

    ID value
0   1    1a
1   2    2a
2   3    3a
3   4    4a
4   6    6a 

The closest I am currently getting to this is by using

df_my_failed_attempt = reduce(lambda  left,right: pd.merge(left,right,on=['ID'], how='outer', validate="one_to_one"), (df1,df2)).fillna('').sort_values(by='ID').reset_index(drop=True)

which results in:

df_my_failed_attempt:

    ID value_x value_y
0   1      1a        
1   2      2a      2a
2   3      3a        
3   4              4a
4   6      6a      6a 

Note that this is a minimum working example; in reality I want to merge dozens of DataFrames (hence the reduce & lambda function) and there are dozens of value columns in each. However, between DataFrames all value columns have a one-to-one translation -- just as in this minimum example.

Edit: Both anky's and Quang Hoang's excellent answers solved the above minimum example. My real application also has a few columns which are unique to each input DataFrame, for which Quang Hoang's answer also solved everything.

MaxG
  • 233
  • 3
  • 8
  • 3
    how about concat the list of dataframes and drop duplicates taking ID as subset , `pd.concat([df1,df2]).drop_duplicates('ID')` , you can chain a `sort_values()` on ID later if you want – anky Apr 29 '20 at 15:24
  • 3
    `df1.merge(df2, on=['ID','value'], how='outer')`? – Quang Hoang Apr 29 '20 at 15:29
  • Thanks anky and Quang, that solved things (see my edit). And thanks r.ook, I read through that post before asking here; while it was eye-opening in many regards, it did not help me with this specific scenario (but I might have missed it). – MaxG Apr 29 '20 at 15:46

0 Answers0