0

I would like to add the missing row dataframe df1 and keep the extra columns information

In [183]: df1
Out[183]:
           City       Country     Region
0       Chicago           US      N.America 
1  San Franciso           US      N.America 
2        Boston           US      N.America 
3        London           UK      Europe
4        Beijing          China   Asia  
5         Omaha           US      N.America


In [183]: df2
Out[183]:
           City          
0       Chicago            
1  San Franciso 
2        Sao Paulo       
3        Boston                 
4        London            
5        Beijing 
6        Tokyo             
7        Omaha           

The desired result after the merge is

           City       Country     Region
0       Chicago           US      N.America 
1  San Franciso           US      N.America 
2     Sao Paulo           nan     nan
3        Boston           US      N.America 
4        London           UK      Europe
5        Beijing          China   Asia 
6         Tokyo           nan     nan
7         Omaha           US      N.America

I am trying with pd.merge(df2, df1, on='City', how='outer') but return keyerror.

bkcollection
  • 793
  • 9
  • 29
  • The answer not necessary to use merge. As long as City in `df2` that not in `df1` add into while maintaining the columns data in df1 – bkcollection Dec 21 '20 at 11:11
  • I think solution `pd.merge(df2, df1, on='City', how='outer')` is good here. What are `df1.columns` and `df2.columns` ? Are there in both `City` column? – jezrael Dec 21 '20 at 11:13
  • Or you need `pd.merge(df2, df1, on='City', how='left')` ? – jezrael Dec 21 '20 at 11:21
  • both return keyeror on `City`. Is it important position of `City` columns. Either 1st column or 2nd column matters? – bkcollection Dec 21 '20 at 11:25
  • Not sure if undertand, what is `print (df1.columns)` and `print (df2.columns)` ? – jezrael Dec 21 '20 at 11:30

1 Answers1

1

Try the code below, using pd.merge, left_join, your desired output:

merged = pd.merge(df2,df1,how='left',on='City')
print(merged)
             City Country     Region
0         Chicago      US  N.America
1  San Fransicsco     NaN        NaN
2       Sao Paolo     NaN        NaN
3          Boston      US  N.America
4          London      UK     Europe
5         Beijing   China       Asia
6           Tokyo     NaN        NaN
7           Omaha      US  N.America

If you want to use an outer join, you can get this result using the below code:

merged_outer = pd.merge(df2, df1, on='City', how='outer')

print(merged_outer)

             City Country     Region
0         Chicago      US  N.America
1  San Fransicsco     NaN        NaN
2       Sao Paolo     NaN        NaN
3          Boston      US  N.America
4          London      UK     Europe
5         Beijing   China       Asia
6           Tokyo     NaN        NaN
7           Omaha      US  N.America
8    San Franciso      US  N.America

DF1 & DF2 respectively:

df1
 
           City Country     Region
0       Chicago      US  N.America
1  San Franciso      US  N.America
2        Boston      US  N.America
3        London      UK     Europe
4       Beijing   China       Asia
5         Omaha      US  N.America


df2

             City
0         Chicago
1  San Fransicsco
2       Sao Paolo
3          Boston
4          London
5         Beijing
6           Tokyo
7           Omaha
jezrael
  • 629,482
  • 62
  • 918
  • 895
sophocles
  • 5,806
  • 3
  • 8
  • 25