1

I have two data frames, one that is names of people born and their frequencies in each year (1880-2017).

name    gender  frequency  year
Mary       F       7065    1880
Anna       F       2604    1880
Emma       F       2003    1880
Elizabeth  F       1939    1880
Minnie     F       1746    1880
...

and the other is years and the total number of births (1880-2017).

 birth_year    Male    Female   Total
     1880     118400    97605  216005
     1881     108282    98855  207137
     1882     122031   115695  237726
     1883     112477   120059  232536
     1884     122738   137586  260324
...

These data frames are not the same size but I want to append the columns from the second data frame to the first data frame if the birth year is the same in order to include percentage population. I want to do something like this:

for i in range(len(all_names_nat_DF)):
    for j in range(len(total_births)):
        if all_names_nat_DF['year'][i] == total_births['birth_year']:
            all_names_nat_DF.append(total_births['birth_year'][j])

But with this I get the error ValueError: Can only compare identically-labeled Series objects

  • Try with merge . – BENY Mar 21 '19 at 15:51
  • What you're looking for is the concept of `joining`, which matches dataframes on a `key column` which is in your case `birth_year`. In pandas we use `merge` for that. I linked a nice Q&A explaining this concept. – Erfan Mar 21 '19 at 15:52

1 Answers1

2

You would want to use df.merge:

df

    name gender frequency year
0   Mary    F   7065    1880
1   Anna    F   2604    1880
2   Emma    F   2003    1880
3   Eliz    F   1939    1880
4   Minnie  F   1746    1880


births

  birth_year  Male  Female  Total
0   1880    118400  97605   216005
1   1881    108282  98855   207137
2   1882    122031  115695  237726
3   1883    112477  120059  232536
4   1884    122738  137586  260324

df.merge(births, how='inner', left_on='year', right_on='birth_year')

    name gender frequency year birth_year Male  Female  Total
0   Mary    F   7065    1880    1880    118400  97605   216005
1   Anna    F   2604    1880    1880    118400  97605   216005
2   Emma    F   2003    1880    1880    118400  97605   216005
3   Eliz    F   1939    1880    1880    118400  97605   216005
4   Minnie  F   1746    1880    1880    118400  97605   216005
ishan
  • 140
  • 11
  • Ishan Saraswat that looks like it worked thank you! May I ask how you knew to join left on year and right on birth_year? That isn't intuitive to me for some reason. – Nathan Cahn Mar 21 '19 at 17:48
  • @NathanCahn the ````left_on```` and ````right_on```` arguments simply denote the column or index level names to join on. If both the left and the right dataframes had a column named 'year', you could instead simply use the ````on```` argument (````on='year'````). The 'left' dataframe in this case would the one that you're calling merge on so in the line ````df.merge(births....)```` df is the left dataframe and births is the right dataframe. Refer to: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html – ishan Mar 21 '19 at 19:29
  • that clears it up thank you for your help! – Nathan Cahn Mar 22 '19 at 21:51