5

I am trying to join to data frames. They look like this

DF1 = ID     COUNTRY     YEAR     V1     V2     V3    V4
      12     USA         2012     x      y      z      a
      13     USA         2013     x      y      z      a
      14     RUSSIA      2012     x      y      z      a

DF2 = ID     COUNTRY     YEAR     TRACT
      9      USA         2000       A
      13     USA         2013       B

The desired end goal is:

DF3 = ID     COUNTRY     YEAR     V1     V2     V3    V4    TRACT    
      9      USA         2000                                 A
      12     USA         2012     x      y      z      a
      13     USA         2013     x      y      z      a      B
      14     RUSSIA      2012     x      y      z      a

I've been trying to use the pd.merge and the .join function with the on='outer' setting to no success

df3 = pd.merge(df1,df2,how='outer',left_on=['ID','Country','Year'],right_on=['ID',"Country","Year"])
bjurstrs
  • 769
  • 2
  • 6
  • 16

2 Answers2

9

try this:

df.merge(df2,how='outer',left_on=['ID','COUNTRY','YEAR'],right_on=['ID',"COUNTRY","YEAR"])

(the column names should be in caps based on your input tables)

JAB
  • 10,561
  • 5
  • 38
  • 46
1

Have you tried

df1.join(df2)

You can add parameters later, but it should work.

Harvey
  • 547
  • 5
  • 17
  • I had a similar problem that turned out to be pandas not correctly typing the index field. In read_csv, I set the dtype of the index field to str type, but forgot to set the engine='c', so it didn't work. Ram immediately maxed out and the machine locked until throwing a memory error several hours later. Run time after fixing was about 2 minutes including writing a 60Mb file. Pandas should at least throw a warning, but doesn't. – Harvey Mar 16 '15 at 01:12