1

I have two dataframes that require comparing the ticket_id column. If there is a match, I need to add a column to the first dataframe that comes from a column from the second dataframe. If there is no match between the first dataframe and second dataframe, then that means there is a new row in the second dataframe that needs to be added to the first.

I have tried using if statements but have unsuccessfully been able to compare them.

df_A (current week help desk ticket report)

ticket_id   category   submitted  closed   status   
1           critical   4/20/19    5/1/19   closed
2           low        4/23/19    5/2/19   closed
3           medium     4/26/19             open
4           low        5/1/19              open

df_B (previous week help desk ticket report)

ticket_id   category   submitted  closed   status   
1           critical   4/20/19             open
2           low        4/23/19             open
3           medium     4/26/19             open

So I essentially want to make a new dataframe based on df_A but take the previous week status for that ticket ID and add it to the new dataframe as the last column. If a new ticket appears from previous week to new week (i.e. ticket_id = 4) then it should be appended and there should be a status of NA or blank (doesn't really matter).

expected df_A

ticket_id   category   submitted  closed   status  previous_week_status  
1           critical   4/20/19    5/1/19   closed  open
2           low        4/23/19    5/2/19   closed  open
3           medium     4/26/19             open    open
4           low        5/1/19              open    NA
  • is `ticket_id` really an ID? i.e. is there only one row per ticket id in each of your dataframes? – Juan C May 02 '19 at 16:54
  • it is an ID i just used 1-4 to simply it but it actually looks something like 'IR2382' for a given ticket. it is the primary key of each row. – prettypython May 02 '19 at 16:55
  • Possible duplicate of [Compare Multiple Columns to Get Rows that are Different in Two Pandas Dataframes](https://stackoverflow.com/questions/33524000/compare-multiple-columns-to-get-rows-that-are-different-in-two-pandas-dataframes) – HS-nebula May 02 '19 at 17:04
  • Possible duplicate of [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – Erfan May 02 '19 at 17:14

2 Answers2

0

This should do:

df_A.set_index('ticket_id', inplace=True)
df_B.set_index('ticket_id', inplace=True)
df_A['previous week status']=df_B.status
Juan C
  • 3,992
  • 1
  • 10
  • 33
0

As @Erfan already indicated, it's probably best to solve this by renaming and merging the dataframe.

df_B_reduced=(df_B.rename(columns={"status":"previous_week_status"})
                .drop(["category","submitted", "closed"]) # drop duplicate info
              )
df_merged=df_A.merge(right=df_B_reduced,
                     how='left', # if an entry is in A, but not in B, add NA values
                     on=["ticket_id"],  # property to merge on
                     validate="one_to_one" # (optional) Check that your ticket_id is actually a unique id
                    )

For some more information, look at Pandas Merging 101 or the official documentation.

Ivo Merchiers
  • 1,195
  • 6
  • 22