0

Looking for help in merging the following two data-frames.

df1:
key     mean    stddev
---     -----   -------
key1    30          3
key2    40          4
key3    50          5

df2:
key     mean    stddev
---     -----   -------
key2    70          7
key3    80          8
key4    90          9

The above two dataframes need to be merged to produce three different results:

1. Rows that are in df1 but not in df2:


    key      mean     stddev
    ---      -----   -------
    key1     30       3

2. Rows that are in both df1 and df2  (please see the new columns delta_mean(difference of mean), delta_stddev (difference of stddev)

    key    mean_x   stddev   mean_y   stddev_y    delta_mean   delta_stddev
    ---    -----   -------   ------   --------    ---------    -----------
    key2    40          4     70       7            30          3
    key3    50          5     80       8            30          3

Additionally, I would like to sort it by the delta_mean.

3. Rows that are in df2 but  but not not in df1

key      mean   stddev
---       -----   -------
key4     90          9
khuss
  • 195
  • 2
  • 10
  • Does this answer your question? [Pandas Merging 101](https://stackoverflow.com/questions/53645882/pandas-merging-101) – noah Jan 21 '21 at 01:03

1 Answers1

1

In the first and third desired output, you can simply filter the keys from the current dataframe that is not present in the another. You need a merge just in the second option:

1. Rows that are in df1 but not in df2:

df1[~df1.key.isin(df2.key)]
    key  mean  stddev
0  key1    30       3

2. Rows that are in both df1 and df2

df1.merge(df2, on='key')
    key  mean_x  stddev_x  mean_y  stddev_y
0  key2      40         4      70         7
1  key3      50         5      80         8

3. Rows that are in df2 but but not not in df1

df2[~df2.key.isin(df1.key)]
    key  mean  stddev
2  key4    90       9
Cainã Max Couto-Silva
  • 3,873
  • 1
  • 7
  • 29
  • In the second case, there are are additional columns for calculating the difference between mean and stddev of both data frames. Is there a way to add those additional columns during the merge operation? – khuss Jan 21 '21 at 02:03
  • You mean there are other columns to merge? If so, you can use `df1.merge(df2, on=['key1', 'key2', 'key'3])` and so on... – Cainã Max Couto-Silva Jan 21 '21 at 20:04
  • No. The result should have additional columns. Thinking more about this, I found this way: `df['delta_mean'] = df['mean_x'] - df['mean_y']`. Wondering if this could be done during the merge stage as well. – khuss Jan 21 '21 at 22:24
  • As far as I know, the pandas merge function doesn't provide such facility. As a possible way to do that, you can get the common keys with `common_keys = df1.key[df1.key.isin(df2.key)]`, then calculate the values across common columns with: `df1.set_index('key').reindex(common_keys) - df2.set_index('key').reindex(common_keys)`. – Cainã Max Couto-Silva Jan 22 '21 at 08:34