Pandas merge outer by ID: also merge all value columns instead of duplicating

Question

I want to merge two DataFrames df1 and df2 such that I receive df_of_my_dreams, with:

df1:

    ID value
0   1    1a
1   2    2a
2   3    3a
3   6    6a

df2:

    ID value
0   2    2a
1   4    4a
2   6    6a

df_of_my_dreams:

    ID value
0   1    1a
1   2    2a
2   3    3a
3   4    4a
4   6    6a

The closest I am currently getting to this is by using

df_my_failed_attempt = reduce(lambda  left,right: pd.merge(left,right,on=['ID'], how='outer', validate="one_to_one"), (df1,df2)).fillna('').sort_values(by='ID').reset_index(drop=True)

which results in:

df_my_failed_attempt:

    ID value_x value_y
0   1      1a        
1   2      2a      2a
2   3      3a        
3   4              4a
4   6      6a      6a

Note that this is a minimum working example; in reality I want to merge dozens of DataFrames (hence the reduce & lambda function) and there are dozens of value columns in each. However, between DataFrames all value columns have a one-to-one translation -- just as in this minimum example.

Edit: Both anky's and Quang Hoang's excellent answers solved the above minimum example. My real application also has a few columns which are unique to each input DataFrame, for which Quang Hoang's answer also solved everything.

how about concat the list of dataframes and drop duplicates taking ID as subset , `pd.concat([df1,df2]).drop_duplicates('ID')` , you can chain a `sort_values()` on ID later if you want — anky, Apr 29 '20 at 15:24
Thanks anky and Quang, that solved things (see my edit). And thanks r.ook, I read through that post before asking here; while it was eye-opening in many regards, it did not help me with this specific scenario (but I might have missed it). — MaxG, Apr 29 '20 at 15:46

Pandas merge outer by ID: also merge all value columns instead of duplicating

0 Answers0