1

Say I have the following DataFrame df1:

name    course   yob     city
paul    A        1995    london
john    A        2005    berlin
stacy   B        2015    vienna
mark    D        2013    madrid

And also the following DataFrame df2:

name    height   occupation   
paul    185      student    
mark    162      pilot

I want to combine them to obtain:

name    course   height   occupation   yob     city
paul    A        185      student      1995    london
john    A        NaN      NaN          2005    berlin
stacy   B        NaN      NaN          2015    vienna
mark    D        162      pilot        2013    madrid

So the idea is I have df1, which is my main data structure, and I want to insert the columns of df2 (which only has information regarding some of the names) in a specific location in df1 (namely in this case between the columns course and yob). The ordering of the columns is important, and shouldn't be changed.

What would be the most straightforward/elegant way of doing this?

lumpy
  • 45
  • 4

2 Answers2

2

Its not clear you want left or outer join. here is simple way for left join

I am using first dataframe as df1 and second dataframe as df2 for result

import pandas as pd

df_result = pd.merge (left=df1, right=df2, how='left', on='name')
# Reorder the columns
df_result = df_result[["name", "course", "height", "occupation", "yob", "city"]]

print(df_result)

If you want outer join

df_result = pd.merge (left=df1, right=df2, how='outer', on='name')
Dan
  • 43,452
  • 14
  • 75
  • 140
Suresh Mali
  • 308
  • 1
  • 4
  • 17
  • Thank you! I wasn't worried about the type of join, since df2 in my case always has a subset of the names in df1. My only problem with your answer is that the ordering for the columns that I specified in my question wouldn't be preserved. I need to insert the columns of df2 between two specific columns of df1. – lumpy Jun 28 '19 at 09:28
  • 2
    add stmt df_result = df_result[['name', 'course',....]] in the order you need – Suresh Mali Jun 28 '19 at 09:48
1

A generalized approach will be merge and then create a list with df2.columns excluding the matching columns in the middle of the list df1.columns and reindex():

final=df1.merge(df2,on='name',how='left')
l=list(df1.columns)
s=l[:len(l)//2]+list(df2.columns.difference(df1.columns))+l[len(l)//2:]
#['name', 'course', 'height', 'occupation', 'yob', 'city']

Then use reindex() on axis=1

final=final.reindex(s,axis=1)
print(final)

    name course  height occupation   yob    city
0   paul      A   185.0    student  1995  london
1   john      A     NaN        NaN  2005  berlin
2  stacy      B     NaN        NaN  2015  vienna
3   mark      D   162.0      pilot  2013  madrid
anky
  • 64,269
  • 7
  • 30
  • 56