How can I join two dataframes with update in some rows, using Pandas?

Question

I'm new to pandas and I would like to know how I can join two files and update existing lines, taking into account a specific column. The files have thousands of lines. For example:

Df_1:

Now, my table 2 has exactly the same columns, and I want to join the two tables replacing some rows that may be in this table and also in table 1 but where there were changes / updates in column C, and add the new lines that exist in this second table (df_2), for example:

Df_2:

So, the result I want is the union of the two tables and their update in a few rows, in a specific column, like this:

Df_result:

How can I do this with the merge or concatenate function? Or is there another way to get the result I want?

Thank you!

Try go through this. It may help now and the future. https://stackoverflow.com/questions/53645882/pandas-merging-101 — wwnde, May 22 '20 at 09:57

score 0 · Accepted Answer · answered May 22 '20 at 11:38

You need to have at least one column as a reference, I mean, to know what needs to change to do the update.

Assuming that in your case it is "A" and "B" in this case.

import pandas as pd
ref = ['A','B']
df_result = pd.concat([df_1, df_2], ignore_index = True)
df_result = df_result.drop_duplicates(subset=ref, keep='last')

Here a real example.

d = {'col1': [1, 2, 3], 'col2': ["a", "b", "c"], 'col3': ["aa", "bb", "cc"]}
df1 = pd.DataFrame(data=d)
d = {'col1': [1, 4, 5], 'col2': ["a", "d", "f"], 'col3': ["dd","ee", "ff"]}
df2 = pd.DataFrame(data=d)

df_result = pd.concat([df1, df2], ignore_index=True)

df_result = df_result.drop_duplicates(subset=['col1','col2'], keep='last')
df_result

Thank you very much Rafeal! Your answer was very useful. – MafaldaCA May 22 '20 at 16:57 — MafaldaCA, May 22 '20 at 16:57

How can I join two dataframes with update in some rows, using Pandas?

1 Answers1