I have two dataframes that I want to merge / join on multiple index columns and I am unsure how to do it.
My first dataframe looks as follows:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], 6),
'Date': np.repeat(pd.date_range(start='2014-01-01', end='2014-01-30', freq='3D')[:-1], 2),
'var1': np.random.randn(18),
'var2': np.random.randn(18)})
df1
Out[1]:
Type Date var1 var2
0 A 2014-01-01 1.158068 -0.010431
1 A 2014-01-01 0.052350 0.779531
2 A 2014-01-04 0.321079 -2.103932
3 A 2014-01-04 -0.796583 0.528920
4 A 2014-01-07 -0.868004 -0.170354
5 A 2014-01-07 -1.090472 -0.116811
6 B 2014-01-10 -0.659798 3.390008
7 B 2014-01-10 -1.054428 0.016943
8 B 2014-01-13 0.853301 0.292624
9 B 2014-01-13 0.814400 -1.213494
10 B 2014-01-16 0.307140 1.866017
11 B 2014-01-16 0.652235 0.431757
12 C 2014-01-19 1.030088 0.646554
13 C 2014-01-19 0.163201 0.974505
14 C 2014-01-22 0.517568 -0.197154
15 C 2014-01-22 -1.252889 0.335804
16 C 2014-01-25 -1.670459 1.335267
17 C 2014-01-25 -0.504036 -1.966568
The dated in the Date
column here are not unique, they can occur multiple times even within the same Type
.
My second one is like a "lookup table" to fill in more variables into df1
. Date
s are unique within Type
here (But one date occurs for each type). Here it is:
dates = df1.Date.unique()
df2 = pd.DataFrame({'Type': np.repeat(['A', 'B', 'C'], len(dates)),
'Date': np.tile(dates, 3),
'var3': np.random.randn(len(dates)*3),
'var4': np.random.randn(len(dates)*3)})
df2
Out[2]:
Type Date var3 var4
0 A 2014-01-01 -1.174070 -0.385655
1 A 2014-01-04 0.297007 0.993622
2 A 2014-01-07 0.022896 0.273704
3 A 2014-01-10 -1.392180 -1.100206
4 A 2014-01-13 -0.681359 2.784101
5 A 2014-01-16 0.228933 0.699884
6 A 2014-01-19 -1.048384 0.649192
7 A 2014-01-22 -1.873971 2.479376
8 A 2014-01-25 0.765804 0.579625
9 B 2014-01-01 -0.274890 0.000616
10 B 2014-01-04 -0.962112 -0.165101
11 B 2014-01-07 1.547124 -0.988169
12 B 2014-01-10 0.140017 2.038517
13 B 2014-01-13 0.436132 -0.727510
14 B 2014-01-16 0.490690 -0.461119
15 B 2014-01-19 0.660947 0.962961
16 B 2014-01-22 -1.153902 0.436929
17 B 2014-01-25 -0.171737 -2.704890
18 C 2014-01-01 0.153577 0.261392
19 C 2014-01-04 0.367761 0.899686
20 C 2014-01-07 -1.766651 0.286800
21 C 2014-01-10 -0.536036 0.322947
22 C 2014-01-13 -1.517577 3.400519
23 C 2014-01-16 0.271890 0.460464
24 C 2014-01-19 1.590970 0.380616
25 C 2014-01-22 0.693004 -0.647738
26 C 2014-01-25 0.797017 0.642877
So what I want to do now is:
- Take
df1
, for each row: - Look up the corresponding
['Type', 'Date']
-Combination indf2
- Fill in the respective
var3
andvar4
values in df1.
I normally do these kinds of operations with a df1.join(df2, on=['Type', 'Date'])
. But for some reason it throws
ValueError: len(left_on) must equal the number of levels in the index of "right"
Not sure what the p roblem is here, or how to do this in an easier fashion