237

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.

A statement like:

SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

Will return something like

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2 

But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a

orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2  

The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.

Elrond_EGLDer
  • 47,430
  • 25
  • 189
  • 180
xtine
  • 2,519
  • 2
  • 17
  • 15

18 Answers18

322
select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName
D'Arcy Rittich
  • 153,827
  • 35
  • 271
  • 277
  • 4
    are there any limitations in this query, for example if number of records is 10million plus ? – Steam Feb 06 '14 at 22:22
  • 3
    @Steam You are correct: this answer is not efficient in a larger database with millions of records. Prefer the GroupBy/Having answer submitted by Aykut, which can be better optimized by the database. One exception: I suggest using Count(0) instead of Count(*), to simplify things. – Mike Christian Sep 22 '14 at 23:25
  • 1
    @Mike - why Count(0) vs Count(*)? – KornMuffin Sep 14 '15 at 12:03
  • 2
    @KornMuffin In retrospect, my comment on Count() is void. Using a non-null evaluation in Count() is useful only when you want to count non-null results returned by an outer join. Otherwise, use Count(*). A great explanation is found [here](http://stackoverflow.com/questions/2710621/count-vs-count1-vs-countpk-which-is-better). – Mike Christian Oct 12 '15 at 23:39
  • use `isnull()` for nullable columns on `on` section – Arif Ulusoy Mar 09 '17 at 16:01
94

You can run the following query and find the duplicates with max(id) and delete those rows.

SELECT orgName, COUNT(*), Max(ID) AS dupes 
FROM organizations 
GROUP BY orgName 
HAVING (COUNT(*) > 1)

But you'll have to run this query a few times.

Bellash
  • 5,719
  • 3
  • 39
  • 72
Aykut Akıncı
  • 1,125
  • 6
  • 8
31

You can do it like this:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

If you want to return just the records that can be deleted (leaving one of each), you can use:

SELECT
    id, orgName
FROM (
     SELECT 
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id
Paul
  • 14,537
  • 13
  • 38
  • 49
10

You can try this , it is best for you

 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations 
    )
    select * from CTE where RN>1
    go
Soner Gönül
  • 91,172
  • 101
  • 184
  • 324
code save
  • 896
  • 1
  • 8
  • 15
9

The solution marked as correct didn't work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql

SELECT n1.* 
FROM myTable n1
INNER JOIN myTable n2 
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id
Community
  • 1
  • 1
ecairol
  • 5,166
  • 1
  • 20
  • 19
7

If you want to delete duplicates:

WITH CTE AS(
   SELECT orgName,id,
       RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
   FROM organizations
)
DELETE FROM CTE WHERE RN > 1
akd
  • 5,782
  • 12
  • 59
  • 102
6
select * from [Employees]

For finding duplicate Record 1)Using CTE

with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte

2)By Using GroupBy

select Name,EmailId,COUNT(name) as Duplicate from  [Employees] group by Name,EmailId 
MD XF
  • 7,062
  • 7
  • 34
  • 64
Debendra Dash
  • 3,944
  • 35
  • 32
4
Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1

So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos. So rownum> 1 will be the duplicate records which could be deleted as such.

Mike Clark
  • 1,774
  • 10
  • 21
  • I like this one because it allows you to easily add more columns in the inner select clause. So if you wanted to return other columns from the 'Organizations' table you don't have to do a 'group by' on those columns. – Gwasshoppa Apr 22 '15 at 03:26
2
select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;

Src : https://stackoverflow.com/a/59242/1465252

Community
  • 1
  • 1
iCrazybest
  • 2,429
  • 2
  • 21
  • 21
2
select a.orgName,b.duplicate, a.id
from organizations a
inner join (
    SELECT orgName, COUNT(*) AS duplicate
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id
Roman Marusyk
  • 19,402
  • 24
  • 55
  • 90
1

You have several way for Select duplicate rows.

for my solutions , first consider this table for example

CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

First solution :

SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

Secound solution : Use identity field

SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

and end of all solution use this command

DROP TABLE #Employee
Ardalan Shahgholi
  • 10,041
  • 13
  • 95
  • 124
1
select orgname, count(*) as dupes, id 
from organizations
where orgname in (
    select orgname
    from organizations
    group by orgname
    having (count(*) > 1)
)
group by orgname, id
Jordão
  • 51,321
  • 12
  • 105
  • 132
0

i think i know what you need i needed to mix between the answers and i think i got the solution he wanted:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
    SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

having the max id will give you the id of the dublicate and the one of the original which is what he asked for:

id org name , dublicate count (missing out in this case) 
id doublicate org name , doub count (missing out again because does not help in this case)

only sad thing you get it put out in this form

id , name , dubid , name

hope it still helps

0

Suppose we have table the table 'Student' with 2 columns:

  • student_id int
  • student_name varchar

    Records:
    +------------+---------------------+
    | student_id | student_name        |
    +------------+---------------------+
    |        101 | usman               |
    |        101 | usman               |
    |        101 | usman               |
    |        102 | usmanyaqoob         |
    |        103 | muhammadusmanyaqoob |
    |        103 | muhammadusmanyaqoob |
    +------------+---------------------+
    

Now we want to see duplicate records Use this query:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+
Rob
  • 25,569
  • 15
  • 73
  • 87
Usman Yaqoob
  • 453
  • 3
  • 12
0

I got a better option to get the duplicate records in a table

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances

Click here to see the result of the sql

SoftIdea
  • 1
  • 2
0
 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1
Code
  • 467
  • 4
  • 8
0

I use two methods to find duplicate rows. 1st method is the most famous one using group by and having. 2nd method is using CTE - Common Table Expression.

As mentioned by @RedFilter this way is also right. Many times I find CTE method is also useful for me.

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName) 
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.

Source : CodoBee

David Buck
  • 3,439
  • 29
  • 24
  • 31
Ishrar
  • 25
  • 6
-2

Try

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;
Aniket Kulkarni
  • 12,142
  • 9
  • 65
  • 83
ryan
  • 1