0

I have df1:

index  Year  var1  var2  var3
0      2010  8     9     10
1      2011  11    12    13
.
.
998    2012  4     1     55
999    2011  19    10    18

and I have df2

index  var4
2010   7.5
2011   5.7
2012   4.9

How can I create a new column in df1 called var4 that matches the column Year with index of df2 and return var4. I have different rows for two tables and in my data years are between 2008 to 2018.

  • 1
    oh I am totally uneducated about merges. I think it is helpful. I am going to read the answers. Thanks for your help. – ricecooker Feb 23 '21 at 20:55

2 Answers2

0

I found an answer actually but it looks lame. Please let me know if you have a better option.

df1['var4'] = np.where(df['Year'] == 2010, df2[2010],
              np.where(df['Year'] == 2011, df2[2011],
              np.where(df['Year'] == 2012, df2[2012], 'NaN')))
0

As suggested, use merge:

df1 = pd.DataFrame({"Year": np.random.randint(2010, 2013, size=10),
                    "var1": np.random.randint(20, 40, size=10),
                    "var2": np.random.randint(30, 50, size=10),
                    "var3": np.random.randint(10, 60, size=10)}).rename_axis("index")

df2 = pd.DataFrame({"var4": [7.5, 5.7, 4.9]},
                   index=[2010, 2011, 2012]).rename_axis("index")
>>> pd.merge(df1, df2, left_on="Year", right_index=True, how="left")
       Year  var1  var2  var3  var4
index
0      2010    39    30    19   7.5
1      2012    39    34    51   4.9
2      2010    38    44    12   7.5
3      2011    37    31    39   5.7
4      2012    29    41    42   4.9
5      2010    29    49    57   7.5
6      2011    22    48    13   5.7
7      2011    36    31    53   5.7
8      2011    31    31    56   5.7
9      2010    25    41    43   7.5
Corralien
  • 3,229
  • 1
  • 3
  • 16