SQL: Removing Duplicate records - Albeit different kind

Question

Consider the following table:

TAB6
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

I consider, the records {1,2, A} and {2, 1, A} as duplicate. I need to select and produce the below record set:

         A          B C                      A          B C
---------- ---------- -             ---------- ---------- -
         1          2 A         or           2          1 A
         2          3 C                      2          3 C
         3          4 D                      3          4 D

I tried the below queries. But to no avail.

select t1.*
from t6 t1
, t6 t2
where t1.a <> t2.b
and t1.b <> t2.a
and t1.rowid <> t2.rowid
/

         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          1 A
         2          3 C
         3          4 D
         3          4 D

6 rows selected.

Or even this:

 select *
 from t6 t1
 where exists (select * from t6 t2 where t1.a <> t2.b and t1.b <> t2.a)
/
         A          B C
---------- ---------- -
         1          2 A
         2          1 A
         2          3 C
         3          4 D

Both did not work.

The database would be Oracle 10g. Looking for a pure SQL solution. Every help is appreciated.

What exactly are you trying to accomplish? Please expand on this. — simchona, Jan 12 '12 at 04:05
I need a SQL to produce the record set {1, 2, A}, {2, 3, C} and {3, 4, D}. For me {1, 2, A} and {2, 1, A} are duplicate records and the result set should have only one tuple (either {1, 2, A} or {2, 1, A}, but not both) — G P, Jan 12 '12 at 04:08
Just to be clear, by "remove" you mean *filter from result set* rather than *delete*. — APC, Jan 12 '12 at 04:23
@APC, you are right. Removing means, filtering in the result set. Thanks for the correction. — G P, Jan 12 '12 at 06:17

APC · Accepted Answer · 2012-01-12T09:59:46.170

Use GREATEST() and LEAST() functions to identify the common values across multiple columns. Then use DISTINCT to winnow out the duplicates.

select distinct least(a, b) as a
       , greatest(a, b) as b
       , c
from t6

This gives you the precise record set you asked for. But things will get more complicated if you need to include other columns from T6.

"But I was wondering if this will work for VARCHAR2 fields also?"

Yes but it will use ASCII values to determine order, which is not always what you might expect (or desire).

"Also, my table T6 might have tens of thousand of records."

That really isn't a lot of data in today's terms. The DISTINCT will cause a sort, which should be able to fit in memory unless A and B are really long VARCHAR2 columns - but probably even then.

If this is a query you're going to want to run a lot then you can build a function-based index to satisfy it:

create index t6_fbi on t6(least(a, b)
                           , greatest(a, b)
                           , c )
/

But I would really only bother if you have a genuine performance issue with the query.

Thanks @APC, I could not check this now. But I was wondering if this will work for VARCHAR2 fields also? Also, my table T6 might have tens of thousand of records.. I appreciate your response, but still Stackoverflow doesn't allow me to up vote you yet. :( — G P, Jan 12 '12 at 06:20

score 0 · Answer 2 · answered Jan 12 '12 at 04:19

0

If the order of columns A and B do not matter and always contain an integer, how about:

select distinct
  least(a, b) as a,
  greatest(a, b) as b,
  c
from
  t6

answered Jan 12 '12 at 04:19

Dan A.

2,885
16
22

SQL: Removing Duplicate records - Albeit different kind

2 Answers2

Linked