How to compare 2 Pandas dataframes and add a new column based on comparison

Question

I have two dataframes that require comparing the ticket_id column. If there is a match, I need to add a column to the first dataframe that comes from a column from the second dataframe. If there is no match between the first dataframe and second dataframe, then that means there is a new row in the second dataframe that needs to be added to the first.

I have tried using if statements but have unsuccessfully been able to compare them.

df_A (current week help desk ticket report)

ticket_id   category   submitted  closed   status   
1           critical   4/20/19    5/1/19   closed
2           low        4/23/19    5/2/19   closed
3           medium     4/26/19             open
4           low        5/1/19              open

df_B (previous week help desk ticket report)

ticket_id   category   submitted  closed   status   
1           critical   4/20/19             open
2           low        4/23/19             open
3           medium     4/26/19             open

So I essentially want to make a new dataframe based on df_A but take the previous week status for that ticket ID and add it to the new dataframe as the last column. If a new ticket appears from previous week to new week (i.e. ticket_id = 4) then it should be appended and there should be a status of NA or blank (doesn't really matter).

expected df_A

ticket_id   category   submitted  closed   status  previous_week_status  
1           critical   4/20/19    5/1/19   closed  open
2           low        4/23/19    5/2/19   closed  open
3           medium     4/26/19             open    open
4           low        5/1/19              open    NA

is `ticket_id` really an ID? i.e. is there only one row per ticket id in each of your dataframes? — Juan C, May 02 '19 at 16:54
it is an ID i just used 1-4 to simply it but it actually looks something like 'IR2382' for a given ticket. it is the primary key of each row. — prettypython, May 02 '19 at 16:55
Possible duplicate of [Compare Multiple Columns to Get Rows that are Different in Two Pandas Dataframes](https://stackoverflow.com/questions/33524000/compare-multiple-columns-to-get-rows-that-are-different-in-two-pandas-dataframes) — HS-nebula, May 02 '19 at 17:04
Possible duplicate of [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) — Erfan, May 02 '19 at 17:14

score 0 · Answer 1 · answered May 02 '19 at 17:01

0

This should do:

df_A.set_index('ticket_id', inplace=True)
df_B.set_index('ticket_id', inplace=True)
df_A['previous week status']=df_B.status

answered May 02 '19 at 17:01

Juan C

3,992
1
10
33

score 0 · Answer 2 · answered May 02 '19 at 17:45

As @Erfan already indicated, it's probably best to solve this by renaming and merging the dataframe.

df_B_reduced=(df_B.rename(columns={"status":"previous_week_status"})
                .drop(["category","submitted", "closed"]) # drop duplicate info
              )
df_merged=df_A.merge(right=df_B_reduced,
                     how='left', # if an entry is in A, but not in B, add NA values
                     on=["ticket_id"],  # property to merge on
                     validate="one_to_one" # (optional) Check that your ticket_id is actually a unique id
                    )

For some more information, look at Pandas Merging 101 or the official documentation.

How to compare 2 Pandas dataframes and add a new column based on comparison

2 Answers2