1

I have two pandas dataframes, df1 and df2. Both having atleast two columns with the same name (c1 and c2) and then there are other columns, such as:

df1
-----------------
c1  c2  c4   c5
-----------------
10  1   hh  2231
11  1   fgf 2142
12  1   fg  1232

df2
-----------------
c1  c2   c6  c7
-----------------
10  2   110  231
10  3   111  332
11  2   112  123
11  3   113  432
12  2   114  432
12  3   115  432
13  2   116  432
13  3   117  432
14  2   118  432
14  3   119  432

I want to merge the two dataframes such that:

  • There is a union of c1 and c2 from both the df
  • c4 and c5 are repeated for all the instances of their corresponding c1 and c2 values from df1
  • The remaining columns, i.e., c6 and c7 are matched against c1 and c2 in df2 and matched to the c1 value in df1 and then added to the resulting df.

In a previous question, I got the answer to perform outer join and I performed outer join based on c1 and c2 and that returns the following for example:

c1  c2  c4  c5      c6      c7
--------------------------------
10  1   hh  2231        
10  2               110     231
10  3               111     332

However, I would like to repeat the values of c4 and c5 as the resulting df as follows:

resulting dataframe:

c1  c2  c4  c5           c6     c7
-----------------------------------
10  1   hh  2231        
10  2   hh  2231        110     231
10  3   hh  2231        111     332
11  1   fgf 2142        
11  2   fgf 2142        112     123
11  3   fgf 2142        113     432
12  1   fg  1232        
12  2   fg  1232        114     432
12  3   fg  1232        115     432
13  1   dd  4532        
13  2   dd  4532        116     432
13  3   dd  4532        117     432
14  2                   118     432
14  3                   119     432

Can anyone help me in this matter? Thanks in advance!

Hanif
  • 267
  • 2
  • 14
  • Same as in the link above, except that you will be merging on 2 columns, not one. This is similar to your previous question, correct? – cs95 Mar 07 '18 at 08:25
  • @cᴏʟᴅsᴘᴇᴇᴅ The difference between the previous question and this one is that in the previous question I get the result as shown in the last dataframe (above) but as i mention in this question, I would like to have some columns repeating and not have missing values when the merge is performed. So i think its not a duplicate question but a further enhancement of the previous question. Can you help in solving this? Thanks! – Hanif Mar 07 '18 at 08:34

2 Answers2

0

You just need to use pd.concat instead of merge or join. Here is an example

import pandas as pd
import numpy as np

a = np.arange(1,4)
b = np.arange(5,8)
c = np.random.randint(0,10,size=3)
d = np.random.randint(0,10,size=3)
df_1 = pd.DataFrame({'a':a,'b':b,'c':c,'d':d})

out:

    a   b   c   d
0   1   5   5   1
1   2   6   7   5
2   3   7   6   9

a = np.arange(4,7)
b = np.arange(7,10)
e = np.random.randint(0,10,size=3)
f = np.random.randint(0,10,size=3)
df_2 = pd.DataFrame({'a':a,'b':b,'e':c,'f':d})
df_2

out:

    a   b   e   f
0   4   7   9   9
1   5   8   9   3
2   6   9   2   1

pd.concat([df_1,df_2])

out:

    a   b    c      d       e       f
0   1   5   5.0     1.0     NaN     NaN
1   2   6   7.0     5.0     NaN     NaN
2   3   7   6.0     9.0     NaN     NaN
0   4   7   NaN     NaN     9.0     9.0
1   5   8   NaN     NaN     9.0     3.0
2   6   9   NaN     NaN     2.0     1.0
vumaasha
  • 2,471
  • 4
  • 23
  • 39
0

You can use:

  • first concat columns c1, c2, sort and if necessary remove duplicated
  • merge both DataFrames by left join
  • last repeat values by ffill and bfill

df = (pd.concat([df1[['c1','c2']], df2[['c1','c2']]])
        .sort_values(['c1','c2'])
        .drop_duplicates()
        .merge(df1, on=['c1','c2'], how='left')
        .merge(df2, on=['c1','c2'], how='left')
)

df[['c4','c5']] = df.groupby('c1')['c4','c5'].apply(lambda x: x.ffill().bfill())
print (df)
    c1  c2   c4      c5     c6     c7
0   10   1   hh  2231.0    NaN    NaN
1   10   2   hh  2231.0  110.0  231.0
2   10   3   hh  2231.0  111.0  332.0
3   11   1  fgf  2142.0    NaN    NaN
4   11   2  fgf  2142.0  112.0  123.0
5   11   3  fgf  2142.0  113.0  432.0
6   12   1   fg  1232.0    NaN    NaN
7   12   2   fg  1232.0  114.0  432.0
8   12   3   fg  1232.0  115.0  432.0
9   13   2  NaN     NaN  116.0  432.0
10  13   3  NaN     NaN  117.0  432.0
11  14   2  NaN     NaN  118.0  432.0
12  14   3  NaN     NaN  119.0  432.0
jezrael
  • 629,482
  • 62
  • 918
  • 895