1

I have two dataframes that I want to merge / join on multiple index columns and I am unsure how to do it.

My first dataframe looks as follows:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], 6),
                    'Date': np.repeat(pd.date_range(start='2014-01-01', end='2014-01-30', freq='3D')[:-1], 2),
                    'var1': np.random.randn(18),
                    'var2': np.random.randn(18)})
df1
Out[1]:

    Type    Date    var1    var2
0   A   2014-01-01  1.158068    -0.010431
1   A   2014-01-01  0.052350    0.779531
2   A   2014-01-04  0.321079    -2.103932
3   A   2014-01-04  -0.796583   0.528920
4   A   2014-01-07  -0.868004   -0.170354
5   A   2014-01-07  -1.090472   -0.116811
6   B   2014-01-10  -0.659798   3.390008
7   B   2014-01-10  -1.054428   0.016943
8   B   2014-01-13  0.853301    0.292624
9   B   2014-01-13  0.814400    -1.213494
10  B   2014-01-16  0.307140    1.866017
11  B   2014-01-16  0.652235    0.431757
12  C   2014-01-19  1.030088    0.646554
13  C   2014-01-19  0.163201    0.974505
14  C   2014-01-22  0.517568    -0.197154
15  C   2014-01-22  -1.252889   0.335804
16  C   2014-01-25  -1.670459   1.335267
17  C   2014-01-25  -0.504036   -1.966568

The dated in the Date column here are not unique, they can occur multiple times even within the same Type.

My second one is like a "lookup table" to fill in more variables into df1. Dates are unique within Type here (But one date occurs for each type). Here it is:

dates = df1.Date.unique()

df2 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], len(dates)),
                    'Date': np.tile(dates, 3),
                    'var3': np.random.randn(len(dates)*3),
                    'var4': np.random.randn(len(dates)*3)})
df2
Out[2]:
    Type    Date    var3    var4
0   A   2014-01-01  -1.174070   -0.385655
1   A   2014-01-04  0.297007    0.993622
2   A   2014-01-07  0.022896    0.273704
3   A   2014-01-10  -1.392180   -1.100206
4   A   2014-01-13  -0.681359   2.784101
5   A   2014-01-16  0.228933    0.699884
6   A   2014-01-19  -1.048384   0.649192
7   A   2014-01-22  -1.873971   2.479376
8   A   2014-01-25  0.765804    0.579625
9   B   2014-01-01  -0.274890   0.000616
10  B   2014-01-04  -0.962112   -0.165101
11  B   2014-01-07  1.547124    -0.988169
12  B   2014-01-10  0.140017    2.038517
13  B   2014-01-13  0.436132    -0.727510
14  B   2014-01-16  0.490690    -0.461119
15  B   2014-01-19  0.660947    0.962961
16  B   2014-01-22  -1.153902   0.436929
17  B   2014-01-25  -0.171737   -2.704890
18  C   2014-01-01  0.153577    0.261392
19  C   2014-01-04  0.367761    0.899686
20  C   2014-01-07  -1.766651   0.286800
21  C   2014-01-10  -0.536036   0.322947
22  C   2014-01-13  -1.517577   3.400519
23  C   2014-01-16  0.271890    0.460464
24  C   2014-01-19  1.590970    0.380616
25  C   2014-01-22  0.693004    -0.647738
26  C   2014-01-25  0.797017    0.642877

So what I want to do now is:

  1. Take df1, for each row:
  2. Look up the corresponding ['Type', 'Date']-Combination in df2
  3. Fill in the respective var3 and var4 values in df1.

I normally do these kinds of operations with a df1.join(df2, on=['Type', 'Date']). But for some reason it throws

ValueError: len(left_on) must equal the number of levels in the index of "right"

Not sure what the p roblem is here, or how to do this in an easier fashion

cripcate
  • 815
  • 4
  • 15

1 Answers1

3

you may be looking for merge

df2.merge(df1, on = ['Type', 'Date'], how = 'right')

It's more covenient to use merge when you are not joining on index, check out this response for more details

Ayoub ZAROU
  • 2,220
  • 4
  • 18
  • 2
    ah okay, yeah, that works. I've also got it to work when I `.set_index(['Type', 'Date'])` for each of the DataFrames before joining. But I guess merge is much more convenient then. I still have to check if everything merged the way I want it, but it looks good so far. Thanks. I imagined it to be more complicated :D – cripcate Jul 29 '19 at 13:16
  • Glad I've been of help – Ayoub ZAROU Jul 29 '19 at 13:17