2

Considering two tables: one which contains all the indices, and another which contains data for some indices. The purpose is to have an entry for all index, regardless of it having data, and if it does, then add it. Similar to a LEFT OUTER JOIN in SQL. The indices table, named dates, could be something like:

     name week
0  name_0    0
1  name_0    1
2  name_0    2
3  name_0    3
4  name_0    4
5  name_0    5
6  name_1    0
7  name_1    1
8  name_1    2

so the table would contain 6 weeks for name_0, and 3 weeks for name_1. Whereas the data table, named data_holes, could be:

     name week otherdata
0  name_0    1         2
1  name_0    2         5
2  name_0    5        20
3  name_1    1         5

The desired merge result would be:

name week other_data
0  name_0    0     N/A
1  name_0    1      2
2  name_0    2      5
3  name_0    3     N/A    
4  name_0    4     N/A
5  name_0    5      20
6  name_1    0     N/A
7  name_1    1     N/A
8  name_1    2      5

However, when using dates.merge(data_holes, how='left', on=['name','week']) I am faced with an error:

ValueError: The column label 'week' is not unique.

GAMT312
  • 25
  • 1
  • 5

2 Answers2

6

I tried both:

  • dates.merge(data_holes, how='left') and
  • dates.merge(data_holes, how='left', on=['name','week'])

and got the proper result. Actually, on=['name','week']) is not needed, because merge by default merges on all common columns.

Maybe you have some "too old" version of Python or Pandas? I have:

  • python: 3.7.0.final.0
  • pandas: 0.24.2
  • numpy: 1.16.3

If you have older versions, upgrade to the indicated above.

Valdi_Bo
  • 24,530
  • 2
  • 17
  • 30
  • 1
    Thank you for your reply! However, as I pointed out, it was a typo on my side, that is why it was not working. Sorry for wasting your time. I will mark your reply as the answer so as to close the thread. – GAMT312 Jul 22 '19 at 16:35
3

I am aware that the question has already been answered but the last time the error about one of the columns mentioned not being unique on a merge operation came up I found out it was because in one of the dataframes to be merged it was duplicated.

For example in the following code:

df1.merge(df2, how='left', on=['col1','col2'])

If the error states that col2 is not unique perhaps 2 or more columns in either dataframe df1 or df2 have the name 'col2'. The column values don't necessarily have to be completely the same for the error to be thrown just 2 or more columns having the same name on an 'on' parameter when the dataframe is about to undergo a merge. If the answer above doesn't solve it, check this too.

angwalt
  • 41
  • 2