1
import pandas as pd

left = pd.DataFrame({'A': ['A1', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3'],
                     'C': ['K0', 'K1', 'K0', 'K1']})

right = pd.DataFrame({'AA': ['A1', 'A3'],
                      'BB': ['B0', 'B3'],
                      'CC': ['K0', 'K1'],
                      'DD': ['D0', 'D1']})

I want to join these two data frames by adding column DD to left. The values of DD should be selected based on comparing A and AA, B and BB, C and CC.

The simple joining case would be as as shown below, but in my case I need to compare columns with different names, and then I want only add DD to right.

result = left.join(right, on='DD')

The result should be:

result = pd.DataFrame({'A': ['A1', 'A1', 'A2', 'A3'],
                       'B': ['B0', 'B1', 'B2', 'B3'],
                       'C': ['K0', 'K1', 'K0', 'K1'],
                       'DD': ['D0', NaN, NaN, 'D1']})
JoeBlack
  • 331
  • 2
  • 3
  • 10

2 Answers2

2

Use pandas merge method with left_on and right_on parameters.

left.merge(right, how='left',
          left_on=['A', 'B', 'C'],
          right_on=['AA', 'BB', 'CC'])[['A', 'B', 'C', 'DD']]

gets you:

    A   B   C   DD
0  A1  B0  K0   D0
1  A1  B1  K1  NaN
2  A2  B2  K0  NaN
3  A3  B3  K1   D1
piRSquared
  • 240,659
  • 38
  • 359
  • 510
1

It looks like you want to merge.

However at the moment the columns names don't match up (A is AA in right).
So first let's normalize them:

In [11]: right.columns = right.columns.map(lambda x: x[0])

Then we can merge on the shared columns:

In [12]: left.merge(right)
Out[12]:
    A   B   C   D
0  A1  B0  K0  D0
1  A3  B3  K1  D1

In [13]: left.merge(right, how="outer")
Out[13]:
    A   B   C    D
0  A1  B0  K0   D0
1  A1  B1  K1  NaN
2  A2  B2  K0  NaN
3  A3  B3  K1   D1
Andy Hayden
  • 291,328
  • 80
  • 565
  • 500
  • Do I also need to normalize columns of `left`? – JoeBlack May 26 '16 at 17:31
  • @JoeBlack the code above is working code. but it depends what your real life example is (and why the same columns are differently labelled in different DataFrames, to me that looks like something I would want to correct before merging/starting calculations). – Andy Hayden May 26 '16 at 17:34