0

I'm wondering what the difference is when you merge by pd.merge versus dataframe.merge(), examples below:

pd.merge(dataframe1, dataframe2)

and

dataframe1.merge(dataframe2)
ggorlen
  • 26,337
  • 5
  • 34
  • 50
  • 1
    Does this answer your question? [What is the difference between join and merge in Pandas?](https://stackoverflow.com/questions/22676081/what-is-the-difference-between-join-and-merge-in-pandas) – huy Jun 02 '20 at 02:16

1 Answers1

0

We've two functions at our disposal for almost the same task pandas.merge() and DataFrame.merge().

pandas.merge(left, right, how='inner', on=None, left_on=None, right_on=None, 
          left_index=False, right_index=False, 
          sort=False, suffixes=('_x', '_y'), copy=True, indicator=False, validate=None)

DataFrame.merge(right, how='inner', on=None, left_on=None, right_on=None, 
             left_index=False, right_index=False, 
             sort=False, suffixes='_x', '_y', copy=True, indicator=False, validate=None)

Both look similar, what's the advantage of using one over the other?

pd.merge() calls for df.merge, so df1.merge(df2) will give almost same results as pd.merge(df1, df2).

However, pd.merge() is wrapping style function and df1.merge() is chaining style, which makes the later easier to chain from left to right

E.g.,

 df1.merge(df2).merge(df3) 
 #looks better and readable [analogus to %>% pipeline operator in R] than 
 pd.merge(pd.merge(df1, df2), df3).

Let's Look at a reproducible example

d1 = pd.read_html('https://worldpopulationreview.com/countries')
pop = d1[0]
print(pop.info(), '\n') #Data for 232 countries for 7 columns

pop.head(3)

d2 = pd.read_html('https://worldpopulationreview.com/country-rankings/median-age')
age = d2[0]
print(age.info(), '\n') #Data for 221 countries for 5 columns

age.head(3)

display('pd.merge(): ', pd.merge(pop, age), 'df.merge(): ', pop.merge(age))
Dr Nisha Arora
  • 370
  • 1
  • 4
  • 14