Finding duplicate rows in SQL Server

Question

I have a SQL Server database of organizations, and there are many duplicate rows. I want to run a select statement to grab all of these and the amount of dupes, but also return the ids that are associated with each organization.

A statement like:

SELECT     orgName, COUNT(*) AS dupes  
FROM         organizations  
GROUP BY orgName  
HAVING      (COUNT(*) > 1)

Will return something like

orgName        | dupes  
ABC Corp       | 7  
Foo Federation | 5  
Widget Company | 2

But I'd also like to grab the IDs of them. Is there any way to do this? Maybe like a

orgName        | dupeCount | id  
ABC Corp       | 1         | 34  
ABC Corp       | 2         | 5  
...  
Widget Company | 1         | 10  
Widget Company | 2         | 2

The reason being that there is also a separate table of users that link to these organizations, and I would like to unify them (therefore remove dupes so the users link to the same organization instead of dupe orgs). But I would like part manually so I don't screw anything up, but I would still need a statement returning the IDs of all the dupe orgs so I can go through the list of users.

score 322 · Accepted Answer · answered Jan 21 '10 at 20:32

322

select o.orgName, oc.dupeCount, o.id
from organizations o
inner join (
    SELECT orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

answered Jan 21 '10 at 20:32

D'Arcy Rittich

153,827
35
271
277

4

are there any limitations in this query, for example if number of records is 10million plus ? – Steam Feb 06 '14 at 22:22
3

@Steam You are correct: this answer is not efficient in a larger database with millions of records. Prefer the GroupBy/Having answer submitted by Aykut, which can be better optimized by the database. One exception: I suggest using Count(0) instead of Count(*), to simplify things. – Mike Christian Sep 22 '14 at 23:25
1

@Mike - why Count(0) vs Count(*)? – KornMuffin Sep 14 '15 at 12:03
2

@KornMuffin In retrospect, my comment on Count() is void. Using a non-null evaluation in Count() is useful only when you want to count non-null results returned by an outer join. Otherwise, use Count(*). A great explanation is found [here](http://stackoverflow.com/questions/2710621/count-vs-count1-vs-countpk-which-is-better). – Mike Christian Oct 12 '15 at 23:39
use `isnull()` for nullable columns on `on` section – Arif Ulusoy Mar 09 '17 at 16:01

score 94 · Answer 2 · edited May 22 '14 at 11:01

94

You can run the following query and find the duplicates with max(id) and delete those rows.

SELECT orgName, COUNT(*), Max(ID) AS dupes 
FROM organizations 
GROUP BY orgName 
HAVING (COUNT(*) > 1)

But you'll have to run this query a few times.

edited May 22 '14 at 11:01

Bellash

5,719
3
39
72

answered Jan 21 '10 at 20:32

Aykut Akıncı

1,125
6
8

You have to run it exactly `MAX( COUNT(*) ) - 1` times, which might still be feasible. – DerMike Aug 09 '16 at 13:11
2

hi is their any way to get all the id's instead of max id like for 2 i can use max and min but what about more than 2? @DerMike – Arijit Mukherjee Dec 05 '16 at 08:14

Paul · Answer 3 · 2010-01-21T21:48:45.710

31

You can do it like this:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName

If you want to return just the records that can be deleted (leaving one of each), you can use:

SELECT
    id, orgName
FROM (
     SELECT 
         orgName, id,
         ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY id) AS intRow
     FROM organizations
) AS d
WHERE intRow != 1

Edit: SQL Server 2000 doesn't have the ROW_NUMBER() function. Instead, you can use:

SELECT
    o.id, o.orgName, d.intCount
FROM (
     SELECT orgName, COUNT(*) as intCount, MIN(id) AS minId
     FROM organizations
     GROUP BY orgName
     HAVING COUNT(*) > 1
) AS d
    INNER JOIN organizations o ON o.orgName = d.orgName
WHERE d.minId != o.id

edited Jan 21 '10 at 21:48

answered Jan 21 '10 at 20:37

Paul

14,537
13
38
49

The first statement works, but the second one doesn't seem to work. – xtine Jan 21 '10 at 20:49
SQL Server doesn't seem to be able to recognize row_number() ? – xtine Jan 21 '10 at 20:54
Ah...do you have an older version of SQL Server? I believe it was introduced in SQL Server 2005. – Paul Jan 21 '10 at 21:06
3

thanks again, every time i need to do this i get here and LOVE YOU – workabyte Mar 08 '14 at 00:18

score 10 · Answer 4 · edited Nov 07 '13 at 08:19

10

You can try this , it is best for you

 WITH CTE AS
    (
    SELECT *,RN=ROW_NUMBER() OVER (PARTITION BY orgName ORDER BY orgName DESC) FROM organizations 
    )
    select * from CTE where RN>1
    go

edited Nov 07 '13 at 08:19

Soner Gönül

91,172
101
184
324

answered Nov 07 '13 at 08:01

code save

896
1
8
15

any way to get all the id in comma split or different columns – Arijit Mukherjee Dec 05 '16 at 08:16

score 9 · Answer 5 · edited May 23 '17 at 11:47

9

The solution marked as correct didn't work for me, but I found this answer that worked just great: Get list of duplicate rows in MySql

SELECT n1.* 
FROM myTable n1
INNER JOIN myTable n2 
ON n2.repeatedCol = n1.repeatedCol
WHERE n1.id <> n2.id

edited May 23 '17 at 11:47

Community

1
1

answered May 20 '13 at 20:27

ecairol

5,166
1
20
19

You'll get lots of dupes in the result set, so you'll have to deal with those too. – Geeky Guy May 09 '16 at 22:25
1

If the id is numeric, checking `n1.id > n2.id` will prevent each pair from showing up twice. – starwed Jun 07 '16 at 19:59

score 7 · Answer 6 · answered Jun 17 '16 at 09:51

7

If you want to delete duplicates:

WITH CTE AS(
   SELECT orgName,id,
       RN = ROW_NUMBER()OVER(PARTITION BY orgName ORDER BY Id)
   FROM organizations
)
DELETE FROM CTE WHERE RN > 1

answered Jun 17 '16 at 09:51

akd

5,782
12
59
102

score 6 · Answer 7 · edited Mar 09 '17 at 05:22

6

select * from [Employees]

For finding duplicate Record 1)Using CTE

with mycte
as
(
select Name,EmailId,ROW_NUMBER() over(partition by Name,EmailId order by id) as Duplicate from [Employees]
)
select * from mycte

2)By Using GroupBy

select Name,EmailId,COUNT(name) as Duplicate from  [Employees] group by Name,EmailId

edited Mar 09 '17 at 05:22

MD XF

7,062
7
34
64

answered Nov 18 '16 at 12:44

Debendra Dash

3,944
35
32

This the fastest solution here, when SELECTing data over 10m rows that is. Thanks – Fandango68 Jan 17 '18 at 23:49

score 4 · Answer 8 · answered Mar 10 '15 at 05:58

4

Select * from (Select orgName,id,
ROW_NUMBER() OVER(Partition By OrgName ORDER by id DESC) Rownum
From organizations )tbl Where Rownum>1

So the records with rowum> 1 will be the duplicate records in your table. ‘Partition by’ first group by the records and then serialize them by giving them serial nos. So rownum> 1 will be the duplicate records which could be deleted as such.

answered Mar 10 '15 at 05:58

Mike Clark

1,774
10
21

I like this one because it allows you to easily add more columns in the inner select clause. So if you wanted to return other columns from the 'Organizations' table you don't have to do a 'group by' on those columns. – Gwasshoppa Apr 22 '15 at 03:26

score 2 · Answer 9 · edited May 23 '17 at 11:54

2

select column_name, count(column_name)
from table_name
group by column_name
having count (column_name) > 1;

Src : https://stackoverflow.com/a/59242/1465252

edited May 23 '17 at 11:54

Community

1
1

answered Mar 17 '15 at 04:01

iCrazybest

2,429
2
21
21

This will only work on tables that have a single column. Which is most probably not useful – Zach Smith Oct 13 '16 at 14:13

score 2 · Answer 10 · edited Sep 15 '15 at 07:04

2

select a.orgName,b.duplicate, a.id
from organizations a
inner join (
    SELECT orgName, COUNT(*) AS duplicate
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) b on o.orgName = oc.orgName
group by a.orgName,a.id

edited Sep 15 '15 at 07:04

Roman Marusyk

19,402
24
55
90

answered Sep 15 '15 at 07:01

user5336758

21
1

Ardalan Shahgholi · Answer 11 · 2013-11-07T07:15:12.243

You have several way for Select duplicate rows.

for my solutions , first consider this table for example

CREATE TABLE #Employee
(
ID          INT,
FIRST_NAME  NVARCHAR(100),
LAST_NAME   NVARCHAR(300)
)

INSERT INTO #Employee VALUES ( 1, 'Ardalan', 'Shahgholi' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 2, 'name1', 'lname1' );
INSERT INTO #Employee VALUES ( 3, 'name2', 'lname2' );
INSERT INTO #Employee VALUES ( 4, 'name3', 'lname3' );

First solution :

SELECT DISTINCT *
FROM   #Employee;

WITH #DeleteEmployee AS (
                     SELECT ROW_NUMBER()
                            OVER(PARTITION BY ID, First_Name, Last_Name ORDER BY ID) AS
                            RNUM
                     FROM   #Employee
                 )

SELECT *
FROM   #DeleteEmployee
WHERE  RNUM > 1

SELECT DISTINCT *
FROM   #Employee

Secound solution : Use identity field

SELECT DISTINCT *
FROM   #Employee;

ALTER TABLE #Employee ADD UNIQ_ID INT IDENTITY(1, 1)

SELECT *
FROM   #Employee
WHERE  UNIQ_ID < (
    SELECT MAX(UNIQ_ID)
    FROM   #Employee a2
    WHERE  #Employee.ID = a2.ID
           AND #Employee.FIRST_NAME = a2.FIRST_NAME
           AND #Employee.LAST_NAME = a2.LAST_NAME
)

ALTER TABLE #Employee DROP COLUMN UNIQ_ID

SELECT DISTINCT *
FROM   #Employee

and end of all solution use this command

DROP TABLE #Employee

Jordão · Answer 12 · 2010-01-21T21:06:15.847

1

select orgname, count(*) as dupes, id 
from organizations
where orgname in (
    select orgname
    from organizations
    group by orgname
    having (count(*) > 1)
)
group by orgname, id

edited Jan 21 '10 at 21:06

answered Jan 21 '10 at 20:49

Jordão

51,321
12
105
132

score 0 · Answer 13 · answered Oct 01 '14 at 08:13

i think i know what you need i needed to mix between the answers and i think i got the solution he wanted:

select o.id,o.orgName, oc.dupeCount, oc.id,oc.orgName
from organizations o
inner join (
    SELECT MAX(id) as id, orgName, COUNT(*) AS dupeCount
    FROM organizations
    GROUP BY orgName
    HAVING COUNT(*) > 1
) oc on o.orgName = oc.orgName

having the max id will give you the id of the dublicate and the one of the original which is what he asked for:

id org name , dublicate count (missing out in this case) 
id doublicate org name , doub count (missing out again because does not help in this case)

only sad thing you get it put out in this form

id , name , dubid , name

hope it still helps

score 0 · Answer 14 · edited Feb 09 '18 at 07:32

Suppose we have table the table 'Student' with 2 columns:

student_id int

student_name varchar

Records:
+------------+---------------------+
| student_id | student_name        |
+------------+---------------------+
|        101 | usman               |
|        101 | usman               |
|        101 | usman               |
|        102 | usmanyaqoob         |
|        103 | muhammadusmanyaqoob |
|        103 | muhammadusmanyaqoob |
+------------+---------------------+

Now we want to see duplicate records Use this query:

select student_name,student_id ,count(*) c from student group by student_id,student_name having c>1;

+---------------------+------------+---+
| student_name        | student_id | c |
+---------------------+------------+---+
| usman               |        101 | 3 |
| muhammadusmanyaqoob |        103 | 2 |
+---------------------+------------+---+

SoftIdea · Answer 15 · 2018-06-25T09:40:30.090

I got a better option to get the duplicate records in a table

SELECT x.studid, y.stdname, y.dupecount
FROM student AS x INNER JOIN
(SELECT a.stdname, COUNT(*) AS dupecount
FROM student AS a INNER JOIN
studmisc AS b ON a.studid = b.studid
WHERE (a.studid LIKE '2018%') AND (b.studstatus = 4)
GROUP BY a.stdname
HAVING (COUNT(*) > 1)) AS y ON x.stdname = y.stdname INNER JOIN
studmisc AS z ON x.studid = z.studid
WHERE (x.studid LIKE '2018%') AND (z.studstatus = 4)
ORDER BY x.stdname

Result of the above query shows all the duplicate names with unique student ids and number of duplicate occurances

Click here to see the result of the sql

score 0 · Answer 16 · answered Apr 26 '19 at 14:01

0

 /*To get duplicate data in table */

 SELECT COUNT(EmpCode),EmpCode FROM tbl_Employees WHERE Status=1 
  GROUP BY EmpCode HAVING COUNT(EmpCode) > 1

answered Apr 26 '19 at 14:01

Code

467
4
8

score 0 · Answer 17 · edited May 05 '20 at 10:00

I use two methods to find duplicate rows. 1st method is the most famous one using group by and having. 2nd method is using CTE - Common Table Expression.

As mentioned by @RedFilter this way is also right. Many times I find CTE method is also useful for me.

WITH TempOrg (orgName,RepeatCount)
AS
(
SELECT orgName,ROW_NUMBER() OVER(PARTITION by orgName ORDER BY orgName) 
AS RepeatCount
FROM dbo.organizations
)
select t.*,e.id from organizations   e
inner join TempOrg t on t.orgName= e.orgName
where t.RepeatCount>1

In the example above we collected the result by finding repeat occurrence using ROW_NUMBER and PARTITION BY. Then we applied where clause to select only rows which are on repeat count more than 1. All the result is collected CTE table and joined with Organizations table.

Source : CodoBee

score -2 · Answer 18 · edited Jan 22 '14 at 06:49

-2

Try

SELECT orgName, id, count(*) as dupes
FROM organizations
GROUP BY orgName, id
HAVING count(*) > 1;

edited Jan 22 '14 at 06:49

Aniket Kulkarni

12,142
9
65
83

answered Jan 22 '14 at 06:29

ryan

1

Finding duplicate rows in SQL Server

18 Answers18

Linked

Related