0

I have two dataframes that are exactly the same:

df1:

P1    ID
1     10
2     10
3     10
2     5
3     5  

df2:

P2    ID
1     10
2     10
3     10
2     5
3     5  

I want to merge them on ID, but I do not want to repeat the values of the P columns in the same row. That is I want the following result:

P1    ID    P2
1     10    2
1     10    3
2     10    1
2     10    3
3     10    1
3     10    2 
2     5     3
3     5     2

Is there a way to do this without doing regular merge and then removing rows with repeated values in P1 and P2?

BKS
  • 1,833
  • 3
  • 20
  • 42
  • But why `without doing regular merge and then removing rows with repeated values` – yatu Feb 05 '19 at 21:08
  • @yatu because the files are so large and I keep running out of memory. I can divide one of the dataframes into subframes and merge each separately and remove repeated values, but I'm wondering if there is a more efficient way – BKS Feb 05 '19 at 21:11
  • So you want no duplicates of `ID`, `P1`, or `P2`? – Jab Feb 05 '19 at 21:12
  • I'm sorry but the logic for merging is not clear here. Can you explain why/how the result is like this? – cs95 Feb 05 '19 at 21:19
  • It looks to me like you want a cross join for each ID, but without rows where P1 == P2. I can't see it any other way, and your explanation leaves a lot to be desired. Help us out please? – cs95 Feb 05 '19 at 21:21
  • 2
    A solution to what I _think_ your question is asking would be `df1.merge(df2, on='ID').query('P1 != P2')` – cs95 Feb 05 '19 at 21:21
  • @Jaba I do not want to have a row where P1 and P2 have equal values. – BKS Feb 05 '19 at 21:23
  • 1
    What a beautiful dup target you've created @coldspeed :P. – d_kennetz Feb 05 '19 at 21:34
  • 1
    @d_kennetz Thanks! I'd rather answer in the comments and close these basic merge questions as duplicate than leave them open to receive the same, tired answers over and over. – cs95 Feb 05 '19 at 21:41

0 Answers0