0

First, I'm trying to do this as a vector operations due to large dataset.

import pandas as pd

sub_list = [{"uniqueId": "123456", "ref_idx": 1},
            {"uniqueId": "123457", "ref_idx": 2},
            {"uniqueId": "123458", "ref_idx": 3},
            {"uniqueId": "123459", "ref_idx": 4},
            {"uniqueId": "123460", "ref_idx": 5},
            {"uniqueId": "123461", "ref_idx": 6}]

primary_list = [{"uniqueId": "123456"},
                {"uniqueId": "123457"},
                {"uniqueId": "123458"},
                {"uniqueId": "123459"},
                {"uniqueId": "123460"},
                {"uniqueId": "123461"},
                {"uniqueId": "123462"},
                {"uniqueId": "123463"},
                {"uniqueId": "123464"},
                {"uniqueId": "123465"}]

subset_df = pd.DataFrame(sub_list)
primary_df = pd.DataFrame(primary_list)

subset_df.set_index("uniqueId", inplace=True)
primary_df.set_index("uniqueId", inplace=True)

primary_df["ref_idx"] = primary_df.loc([subset_df.index]["ref_idx"])

The issue is with the last statement. I've tried various iterations of how to acquire the slice from the subset_df (ref_idx value) and populate it to the primary_df as a new column. Of course those records that doen't appear in the subset_df will not have value (NaN) in the primary_df. That's ok.

Just not sure the correct syntax.

I'm seeing errors like TypeError: list indices must be integers or slices, not str; call() takes from 1 to 2 positional arguments but 3 were given which happens when one separates the ([subset_df.index], ["ref_idx"])

Basically, use the index to qualify the data but return the ref_idx value.

Ideas?

vikrant rana
  • 3,734
  • 3
  • 23
  • 53
Fred
  • 49
  • 6
  • `primary_df['ref_idx'] = subset_df['ref_idx']` since it's a non-duplicated index – ALollz Jul 03 '19 at 19:28
  • It certainly a case of `merge`, but for quick fix of your last line: `primary_df.loc[subset_df.index, 'ref_idx'] = subset_df.ref_idx` – Quang Hoang Jul 03 '19 at 19:28
  • `primary_df.merge(subset_df['ref_idx'], on='uniqueId', how='outer')` – political scientist Jul 03 '19 at 19:30
  • Thanks ALollz and Quang Hoang, both options work. I had posed this question different and though the suggestion was to do a "merge", the need for a vector operation eliminated that possibility due to performance. – Fred Jul 03 '19 at 20:55

0 Answers0