GROUP BY another table that have been grouped with two sub query

Question

I have table like this

Table1

ID      |    Val         |     Val2       |
606541  |3175031503131004|3175032612900004|
606542  |3175031503131004|3175032612900004|
677315  |3175031503131004|3175032612980004|
222222  |1111111111111111|8888888888888888|
231233  |1111111111111111|3175032612900004|
111111  |9999992222211111|1111111111111111|
57      |3173012102121018|3173015101870020|
59      |3173012102121018|3173021107460002|
2       |900             |7000            |
4       |900             |7001            |

I have two condition with column Val and Val2. Show the result if the Val:

Val column has at least two or more duplicate values AND
Val2 column has no duplicate value (unique)

For example :

Sample 1

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|  
 677315  |3175031503131004|3175032612980004|

 False, because  even the Val column 
 had two or more duplicate but the Val2 
 had dulicate value (ID 606541  and 606542)

Sample Expected 1 Result

 No records

Sample 2

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|   
 111111  |9999992222211111|1111111111111111|

 True, Because the condition is match, 
 Val column had duplicate value AND Val2 had unique values

Sample 2 Expected Result

 ID      |    Val         |     Val2       |
 222222  |1111111111111111|8888888888888888|
 231233  |1111111111111111|3175032612900004|

Sample 3

 ID      |    Val         |     Val2       |
 606541  |3175031503131004|3175032612900004|
 606542  |3175031503131004|3175032612900004|
 677315  |3175031503131004|3175032612980004|
 222222  |1111111111111111|8888888888888888|     
 231233  |1111111111111111|3175032612900004|
 111111  |9999992222211111|1111111111111111|

 Note : This is false condition, Because even the value for id 606541, 606542, and
 677315 in column Val had duplicate value at least 
 two or more but the value in column Val2 had no unique value (it could be true condition if id 606541, 
 606542, and 677315 had 3 different value on Val2).

 NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column 
 Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match 
 the second condition which only have no duplicate value

Sample 3 Expected Result

 No records

Now back to Table1 in the earlier, i tried to show result from the two condition with this query

SELECT
tb.* FROM table1 tb 
WHERE
    tb.Val2 IN (
    SELECT ta.Val2 
    FROM (
        SELECT
            t.* 
        FROM
            table1 t 
        WHERE
            t.Val IN ( 
            SELECT Val FROM table1 
            GROUP BY Val 
            HAVING count( Val ) > 1 ) 
        ) ta 
    GROUP BY
        ta.Val2 
    HAVING
    count( ta.Val2 ) = 1 
    )

The result

ID         Val                   Val2
677315  3175031503131004    3175032612980004
222222  1111111111111111    8888888888888888
57      3173012102121018    3173015101870020
59      3173012102121018    3173021107460002
2       900                  7000            
4       900                  7001

While i expect the result was like this:

ID         Val                   Val2
57  3173012102121018    3173015101870020
59  3173012102121018    3173021107460002
2       900             7000            
4       900             7001

Is there something wrong with my query ?

Here is my DB Fiddle.

The way you’ve written the query as nested sub queries, it’s only checking that val2 is singular in the dataset returned from the check that val1 has multiples. Instead of nesting the sub worries try writing them as two separate subqueries ANDed in the WHERE clause. — RBarryYoung, Jul 06 '20 at 23:02
I mean i dont know how to do your advice with my condition. Is there any link or references for that ? — Gagantous, Jul 06 '20 at 23:10
In many cases nested conditions will return the same results as ANDed conditions, however in this case because your conditions are supposed to be based on counts from the total rowset, they will return different results. — RBarryYoung, Jul 06 '20 at 23:16

Harshil Doshi · Answer 1 · 2020-07-15T15:09:34.880

7

You have to use Group By to find val & val2 with duplicate values and need to use Inner Join and Left Join in order to include/eliminate records as given conditions (oppose to IN, NOT IN etc. clauses that might cause performance issues in case you're dealing with large data).

Please find the query below:

select t1.*from table1 t1 left join
      (select val from table1
       where val2 in (select val2 from table1 group by val2 having count(id) > 1)
        ) t2
 on t1.val = t2.val
 inner join
     (select val from table1 group by val having count(id) >1) t3
     on t1.val = t3.val
 where t2.val is null

Query for Reverse Condition:

select t1.*from table1 t1 inner join
       (select val from table1 group by val having count(id) = 1)
         t2
 on t1.val = t2.val
 inner join
     (select val2 from table1 group by val2 having count(id) >1) t3
     on t1.val2 = t3.val2

Please find fiddle for both queries here.

edited Jul 15 '20 at 15:09

answered Jul 10 '20 at 18:35

Harshil Doshi

3,419
3
11
30

can u add expected output in case of `val colum is unique and val2 is duplicate`? – Harshil Doshi Jul 15 '20 at 06:11
I believe with 'val colum is unique and val2 is duplicate' condition & `val->val2` order, you'll get 0 rows for the given data sample. Am I correct? – Harshil Doshi Jul 15 '20 at 08:33
1

yes that's right the result would be 0 rows if i used this condition `val colum is unique and val2 is duplicate` – Gagantous Jul 15 '20 at 14:54
thank you for the clarification. I have added query for the same & fiddle link for both the cases. Please validate & let me know whether it's working or not. – Harshil Doshi Jul 15 '20 at 15:10
1

Also...please accept the answer & award the Bounty to the answer that you think resolved your issue in the best way. Thank you. – Harshil Doshi Jul 15 '20 at 20:30
1

@Gagantous Thank you for rewarding the Bounty. You can tag me in the comments for any SQL related questions. – Harshil Doshi Jul 16 '20 at 15:13

Sujitmohanty30 · Accepted Answer · 2020-07-14T16:34:34.817

7

Excuse for any mistakes as this would be my first answer in this forum. Could you also try with below, i agree to the answer with window function though.

SELECT t.*
FROM   table1 t 
WHERE  t.val IN (SELECT val 
                   FROM table1 
                 GROUP BY val 
                 HAVING COUNT(val) > 1 
                    AND COUNT(val) = COUNT(DISTINCT val2)
                 )
AND    t.val NOT IN (SELECT t.val
                     FROM   table1 t
                     WHERE  EXISTS (SELECT 1
                             FROM   table1 tai
                             WHERE  tai.id != t.id
                             AND    tai.val2 = t.val2));

/* first part of where clause makes sure we have distinct values in column val2 for repeated value in column val

second part of where clause with not in tells us there is no value shares across different ids with respect to value in column val2 */

--reverse order query ( not sure gives the expected result)

SELECT t.*
FROM   table2 t
WHERE  t.val IN (SELECT val FROM table2 GROUP BY val HAVING COUNT(val) = 1)
AND    t.val2 IN (SELECT t.val2
                  FROM   table2 ta
                  WHERE  EXISTS (SELECT 1
                          FROM   table2 tai
                          WHERE  tai.id != ta.id
                          AND    tai.val = ta.val));

edited Jul 14 '20 at 16:34

answered Jul 12 '20 at 08:03

Sujitmohanty30

3,098
2
2
18

do you know how could i do it reverse ? like `val colum is unique and val2 is duplicate` ? – Gagantous Jul 14 '20 at 13:05
@Gagantous: If i understood you and the conditions remains unchanged wrt original question, just replacing **val -> val2** and **val2 -> val** in the query I posted should do the work. or did I not get the point ? – Sujitmohanty30 Jul 14 '20 at 15:48
i cant do that because the process should be in this order : `val` -> `val2`, if i used your suggestion, it would be end up with this order `val2` -> `val` – Gagantous Jul 14 '20 at 16:20
1

Pardon me as i certainly cannot imagine the output data. Could you try with the second query i added to the answer ? – Sujitmohanty30 Jul 14 '20 at 16:35
the aggregate will be different if i used with this order `val2` -> `val` – Gagantous Jul 14 '20 at 16:37
Great!! Nice to hear that. Cheers :) – Sujitmohanty30 Jul 14 '20 at 16:47

score 4 · Answer 3 · answered Jul 11 '20 at 16:24

Can you try this and let me know the results? SQL fiddle

SELECT t1.id, t1.val, t1.val2 FROM table1 t1
JOIN (
  select val from
  (select id, val, val2 from table1 group by val2 having count(1) = 1) a
  group by a.val having count(1) > 1
)t2 on t1.val = t2.val;

hossein zakizadeh · Answer 4 · 2020-07-08T09:10:18.460

3

you can use group by :

select * from (select * from #table1 where Val2 in (select Val2 val from #table1 group by Val2 having COUNT(*) =1 )) select1
         where select1.val in  (select Val val from #table1 group by Val having COUNT(*) >1)

or you can use RANK :

 select * from  ( SELECT 
     i.id,
    i.Val val,
    RANK() OVER (PARTITION BY i.val ORDER BY i.id DESC) AS Rank1,
        RANK() OVER (PARTITION BY i.val2 ORDER BY i.id DESC) AS Rank2
FROM #table1 AS i 

) select1 where  select1.Rank1 >1 or select1.Rank2 =2

edited Jul 08 '20 at 09:10

answered Jul 07 '20 at 07:01

hossein zakizadeh

101
7

why there is table2 in there ? i only used one table – Gagantous Jul 07 '20 at 07:04
why `CTE_select2 ` didnt calculate column `Val2 ` ? i dont see any `Val2` aggregate in there – Gagantous Jul 07 '20 at 07:10
got an error https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=8df1be0373458545e7810bc4139480f5 – Gagantous Jul 07 '20 at 07:18
I used SQL server, not my SQL: https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=7f8483d26d6b7d029cbff115a05e4c89 – hossein zakizadeh Jul 07 '20 at 07:41
the result still not what i expect, see my expected result – Gagantous Jul 07 '20 at 07:55
code only answers are frowned upon on SO. Please provide explanation highlighting which parts of your answer addresses OP's issue, and why. For long term value, future visitors should be able to learn from your answer to apply to their own coding issues. Quality answers keeps quality of SO high, prevents "haz me the codz" Q/A, and increases usefulness which also leads to more upvotes. Code is rather terse and easy to misinterpret, so natural language can help break through all that, leading to quick understanding. Please consider editing to add more info. – SherylHohman Jul 08 '20 at 03:58
@Gagantous did you check my answer? – hossein zakizadeh Jul 15 '20 at 05:37

score 3 · Answer 5 · answered Jul 14 '20 at 22:01

You don't need group by or having. Sub-selects will do the job just fine.

SELECT * FROM MyTable a
WHERE  (SELECT Count(*) FROM MyTable b WHERE  a.val = b.val) >= 2 
AND (SELECT Count(*)  FROM MyTable c WHERE  a.val2 = c.val2) = 1;

This looks at the table as if it was 3 identical tables, but it's just one. The first sub select

(SELECT Count(*) FROM MyTable b WHERE a.val = b.val)

returns a number containing how many occurrences of "Val" are in the table; if there are at least 2 we're good to go. The second sub select

(SELECT Count(*) FROM MyTable c WHERE a.val2 = c.val2)

returns a number containing how many occurrences of "Val2" are in the table; if it's 1 and the first sub select returns at least 2 then we print the record.

score 2 · Answer 6 · answered Jul 07 '20 at 02:44

2

If you want a solution, i think this will help.

I got the val2s which has no duplicates vals which has more than 1 duplicates and join

Select t.* from 
table1 t
inner join 
(Select val2 from table1 group by val2 having count(*) = 1) tv2 on t.val2 = tv2.val2
inner join 
(Select val from table1 group by val having count(*) > 1) tv on t.val = tv.val;

answered Jul 07 '20 at 02:44

Srinika Pinnaduwage

999
1
6
13

this is same with my query result, see this fiddle https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=3ae3b22755ea2c49583e504f7767cb6c. see the end of the question, i added the expected result – Gagantous Jul 07 '20 at 06:16
1

Your fiddle data and the data in the question samples do not match. Foe ex. Id = 2,4 are not in fiddle. – Srinika Pinnaduwage Jul 07 '20 at 11:58
still same results, i have updated my fiddle, here is fiddle of your query syntax https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=0fb7898614bbabf77bc776931ef0166d – Gagantous Jul 10 '20 at 16:16

forpas · Answer 7 · 2020-07-07T12:05:32.983

2

You can do it with EXISTS and NOT EXISTS.

If you want only the column Val:

select t1.val from table1 t1
where not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)
group by t1.val
having count(t1.val) > 1

If you want full rows:

select t1.* from table1 t1
where exists (select 1 from table1 where id <> t1.id and val = t1.val)
and not exists (
  select 1 from table1 
  where val = t1.val and val2 in (select val2 from table1 group by val2 having count(*) > 1)
)

And one solution with window functions for MySql 8.0+:

select t.id, t.val, t.val2
from (
  select *, max(counter2) over (partition by val) countermax
  from (
    select *,
      count(*) over (partition by val) counter,
      count(*) over (partition by val2) counter2
    from table1
  ) t
) t 
where t.counter > 1 and t.countermax = 1

See the demo.

edited Jul 07 '20 at 12:05

answered Jul 07 '20 at 10:51

forpas

117,400
9
23
54

I know this query seems work, but why count `val2` that more than 1 ? i thought the duplicate value is column `val`, not column `val2` – Gagantous Jul 07 '20 at 11:40
1

`select val2 from table1 group by val2 having count(*) > 1` returns all the `val2`s that are duplicated and these are the ones that must be excluded. – forpas Jul 07 '20 at 11:43
OHHHHH i didnt realized – Gagantous Jul 07 '20 at 12:31
i have another question, where is the query that search for column `val` that had duplicate value more than 1 ? i couldnt find it – Gagantous Jul 07 '20 at 12:58
1

This is what EXISTS does: `WHERE exists (select 1 from table1 where id <> t1.id and val = t1.val)` – forpas Jul 07 '20 at 13:03
how could i do it reverse ? i mean if Val had no duplicates value and Val2 had duplicate value more than 1 ? – Gagantous Jul 07 '20 at 13:24
i have tried this, but it's run very slow on table with huge data ( i have 1 Milion data ), it tooks 3000S to process – Gagantous Jul 08 '20 at 22:46
i suggested you to mark this as answer, the problem with the speed on query are the other problem. – 18Man Jul 09 '20 at 04:07
@FachryDzaky i will wait for another answer, if there is no other answer, i will give the bounty to forpass – Gagantous Jul 09 '20 at 05:42
if you run this on mysql, i suggest you to make index or make it in store procedure if the query run slow – 18Man Jul 09 '20 at 07:03
@FachryDzaky i just want to see other people answer that's all, maybe there is better approach or some interesting way to solve my problem, i did upvoted the answer tho – Gagantous Jul 09 '20 at 07:10
1

no, i meant, if thee issues also about performance of run query (because there's a lot of data) i suggest you to also made index, see https://www.drupal.org/docs/7/guidelines-for-sql/the-benefits-of-indexing-large-mysql-tables – 18Man Jul 09 '20 at 07:12
my apologize, i misunderstood the context of your comment before – Gagantous Jul 09 '20 at 07:19
I'm skeptical of the performance of `NOT EXISTS ( ... IN ( ... HAVING ) )`. I worry that the `SELECT .. HAVING` will be repeatedly be executed. (Can you get the `EXPLAIN`?) suggest you try to turn it inside out so the `SELECT ... HAVING` is performed only once. – Rick James Jul 09 '20 at 14:07

Lars Skaug · Answer 8 · 2020-07-14T12:51:15.083

Common Table Expressions may help readability and perhaps performance as well.

with dup as (select  val, count(*) -- two or more of val
             from table1
             group by val
             having count(*)>1)   
select  tb1.* 
from table1 tb1
 inner join dup
  on dup.val = tb1.val
where not exists (select val2, count(*) -- Not exists is generally fast
                  from table1
                  where val = tb1.val
                  group by 1
                  having count(*) > 1)

Fiddle

MTay · Answer 9 · 2020-07-15T00:48:23.727

I'm going through your dataset at the moment, and I feel like your final result is accurate when you compare the results to your original dataset. Your criteria used are:

Val is duplicated at least once
Val2 is unique

9999992222211111 is the only unique value in the Val list, so that's the only value I don't expect to see in the final result. For Val2, the only duplicated value is 3175032612900004, so I don't expect to see in the final result.

What it sounds like you're trying to do is to apply the original conditions to your final result table (which is different from your original data table). If that's what you're after, you can go through the same process applied to the original table to your new table, in which you'll get the exact result you want.

I've taken that and included all of this in my fiddle below. You'll see two output queries, one with the result you're seeing, and one with the result you want. Let me know if this answers your question! =)

Here's my fiddle: fiddle

Dipen Parekh · Answer 10 · 2020-07-15T08:36:27.367

The answer to your query

Is there something wrong with my query ?

is in your Note 2 of Sample 3

NOte 2 : for Id 222222 and 231233 that had duplicate value, this is still false, because the column Val2 with ID 231233 had the same value with ID 606542 and 606541 (3175032612900004), so it didnt match the second condition which only have no duplicate value

You are not eliminating the records where Val2 is duplicate with another record outside the set. So, all you need to do in your query is to add the below condition

AND tb.Val NOT IN (SELECT t.Val
               FROM table1 t 
               WHERE t.Val2 IN (SELECT Val2 FROM table1 GROUP BY Val2 HAVING count( Val2 ) > 1 ))

I have added this condition to your query and see the expected results. See fiddle below

My Fiddle

The answer given by @Govind feels like a better re-write of your requirements. It is checking for the duplicates of Val column only when there are no duplicates in Val2 column. Very neat and concise query.

Answer by Govind

score 1 · Answer 11 · answered Jul 15 '20 at 13:57

Something like this?

SELECT *
  FROM table1
 WHERE val IN
       (SELECT val
          FROM table1
         GROUP BY val
        HAVING COUNT(*) > 1 AND COUNT(DISTINCT val2) = COUNT(*))
   AND val NOT IN (SELECT t.val
                     FROM table1 t
                    INNER JOIN (SELECT val2
                                 FROM table1
                                GROUP BY val2
                               HAVING COUNT(*) > 1) x
                       ON x.val2 = t.val2);

score 0 · Answer 12 · edited Jul 15 '20 at 06:10

0

`select val, count(*) from table1 group by val having count(*)>=2;`

`val                count(*)`

`1111111111111111   2`

`3173012102121018   2`

`3175031503131004   3`

`900                2`

Val column has at least two or more duplicate values - TRUE

select val2, count(*) from table1 group by val2 having count(*)>1;

`val2   count(*)`
`3175032612900004   3`

Val2 column has no duplicate value (unique) - FALSE

So ideally you should get no records found right?

edited Jul 15 '20 at 06:10

Gagantous

690
4
21
50

answered Jul 14 '20 at 14:46

vijayan007

270
4
11

what do you mean by this ? – Gagantous Jul 16 '20 at 06:54
from your SQL fiddler, I found duplicates for val2, so answer for the given content is no records, did I get that right? – vijayan007 Jul 17 '20 at 13:19

GROUP BY another table that have been grouped with two sub query

12 Answers12