Say I have the following 2 pandas dataframes:
import pandas as pd
A = [174,-155,-931,301]
B = [943,847,510,16]
C = [325,914,501,884]
D = [-956,318,319,-83]
E = [767,814,43,-116]
F = [110,-784,-726,37]
G = [-41,964,-67,-207]
H = [-555,787,764,-788]
df1 = pd.DataFrame({"A": A, "B": B, "C": C, "D": D})
df2 = pd.DataFrame({"E": E, "B": F, "C": G, "D": H})
If I do concat
with join=outer
, I get the following resulting dataframe:
pd.concat([data1,data2], join='outer')
If I do df1.combine_first(df2)
, I get the following:
df1.set_index('B').combine_first(df2.set_index('B')).reset_index()
If I do pd.merge(df1, df2)
, I get the following which is identical to the result produced by concat
:
pd.merge(data1, data2, on=['B','C','D'], how='outer')
And finally, if I do df1.join(df2, how='outer')
, I get the following:
df1.join(df2, how='outer', on='B', lsuffix='_left', rsuffix='_right')
I don't fully understand how and why each produces different results.