I want to merge two DataFrames df1 and df2 such that I receive df_of_my_dreams, with:
df1:
ID value
0 1 1a
1 2 2a
2 3 3a
3 6 6a
df2:
ID value
0 2 2a
1 4 4a
2 6 6a
df_of_my_dreams:
ID value
0 1 1a
1 2 2a
2 3 3a
3 4 4a
4 6 6a
The closest I am currently getting to this is by using
df_my_failed_attempt = reduce(lambda left,right: pd.merge(left,right,on=['ID'], how='outer', validate="one_to_one"), (df1,df2)).fillna('').sort_values(by='ID').reset_index(drop=True)
which results in:
df_my_failed_attempt:
ID value_x value_y
0 1 1a
1 2 2a 2a
2 3 3a
3 4 4a
4 6 6a 6a
Note that this is a minimum working example; in reality I want to merge dozens of DataFrames (hence the reduce & lambda function) and there are dozens of value columns in each. However, between DataFrames all value columns have a one-to-one translation -- just as in this minimum example.
Edit: Both anky's and Quang Hoang's excellent answers solved the above minimum example. My real application also has a few columns which are unique to each input DataFrame, for which Quang Hoang's answer also solved everything.