I have a table:
table1
unique_id user_id user_seq col_name value_val position
1 100 1 test1 100 1
1 100 1 test2 123 1
1 100 1 test1 a 2
1 100 1 test2 text 2
1 100 1 test3 1Rw 2
1 100 1 test4 1Tes 2
2 101 1 test1 1 1
2 101 1 test2 1 1
2 101 1 test3 1 1
2 101 1 test4 1 1
2 101 1 test5 1 1
3 100 1 test1 100 1
3 100 1 test2 123 1
3 100 1 test1 a 2
3 100 1 test2 text 2
3 100 1 test3 1Rw 2
3 100 1 test4 1Tes 2
4 101 1 test1 1 1
4 101 1 test2 1 1
4 101 1 test3 1 1
4 101 1 test4 1 1
I need to find duplicate based on the following:
user_id
, user_seq
, col_name
, value_val
and position
should be exactly same for different unique_id.
In the above example, unique_id
- 1 and 3 are EXACTLY same, so they should be returned as output.
For unique_id
= 2 and 4, there is a difference of test5 not available for unique_id
= 4, so it won't be captured.
Output would be:
unique_id
1
3
Also, my dataset is huge, around 50 million records, so need an optimized solution. Any help?
EDIT
My table structure:
Name Null? Type
----------- ----- --------------
UNIQUE_ID NUMBER
USER_SEQ VARCHAR2(100)
COL_NAME VARCHAR2(263)
VALUE_VAL VARCHAR2(4000)
POSITION NUMBER
USER_ID NUMBER
No indexes available.