0

Is there a way to improve my key column selection on 2 data frames? Below data is what I've coded and the result expectations were correct.

I listed the key_columns for reference as I have used it to drop duplicates instead of indicating the columns one by one.

I need help if is there is any way for my selection inside the FOR loop to shorten or be written in pythonic way? It would be a great help as I would need to select more data other than QTY. Thank you.

key_columns = ['COMPANY', 'PLANT', 'MATERIAL', 'DATE']
dfunique = dfunique.drop_duplicates(key_columns)
# convert all into a new table
for i, dfkey in dfunique.iterrows():
    try:
        dfunique.loc[i, 'BEFORE_QTY'] = df[df.COMPANY == dfkey.COMPANY] \
            [df.PLANT == dfkey.PLANT] \
            [df.MATERIAL == dfkey.MATERIAL] \
            [df.DATE == dfkey.DATE] \
            [df.ROW_TYPE == 'BEFORE'].QTY.item()
    except ValueError:
        dfunique.loc[i, 'BEFORE_QTY'] = 0

Sample data Hello, My output expectation looks like this, the logic is correct, I am just concern about the selection. Question on the MERGE suggestion, should I break my table into 2? BEFORE and AFTER and just merge it?

enter image description here

chuam2
  • 13
  • 4
  • would you provide some sample data for these two dataframes? – Ian Apr 11 '21 at 15:35
  • Is there something missing on the right side of assignment `df[df.company == dfkey.company] \ [df.PLANT == dfkey.PLANT] ....` – Joe Ferndz Apr 11 '21 at 18:05
  • did you try to do a merge of df and dfunique on the keys to get records you wanted? left or right join as per your needs may be a better fit than iterating through the whole dataframe – Joe Ferndz Apr 11 '21 at 18:07
  • Read more about merge join concat [here](https://stackoverflow.com/questions/53645882/pandas-merging-101) and [here](https://pandas.pydata.org/docs/user_guide/merging.html) – Joe Ferndz Apr 11 '21 at 18:08
  • For transform use `pivot` – jezrael Apr 12 '21 at 05:53
  • thank you for that suggestion Joe, i will check merge, but would that be efficient? i'm thinking of cutting the raw table into two one filtered BEFORE one filtered AFTER, then merging afterwards, i will update this thread once solved – chuam2 Apr 12 '21 at 06:04
  • thank you experts! the correct solution for my requirement is in the link above, question 10 df.pivot_table(index=key_columns, columns='ROW_TYPE', values='QTY').reset_index() – chuam2 Apr 12 '21 at 14:39

0 Answers0