0

I have a MySQL table with many duplicate rows. How can I go about finding the id's and deleting them. I need to leave the first lead_id and delete the any other duplicates.

So in this sample I would need to find the duplicate email values and delete all the rows. IE. delete all the rows with a lead_id of 40944 and keep all the 40943.

id      |   lead_id     | form  |field_number   |   value
--------+---------------+-------+---------------+----------------------
537618  |   40943       |1      | 3.3           |   Mike
537622  |   40943       |1      | 4.3           |   Mesa
537623  |   40943       |1      | 4.4           |   AZ
537624  |   40943       |1      | 4.5           |   85210
537625  |   40943       |1      | 4.6           |   United States
537626  |   40943       |1      | 5             |   mike@email.com
537627  |   40943       |1      | 6             |   (555) 555-5555
537628  |   40943       |1      | 19            |   JM-SL-I4CLR,JM-FM-I5CLR
537629  |   40943       |1      | 12            |   2015-10-01
547618  |   40944       |1      | 3.3           |   Mike
547622  |   40944       |1      | 4.3           |   Mesa
547623  |   40944       |1      | 4.4           |   AZ
547624  |   40944       |1      | 4.5           |   85210
547625  |   40944       |1      | 4.6           |   United States
547626  |   40944       |1      | 5             |   mike@email.com
547627  |   40944       |1      | 6             |   (555) 555-5555
547628  |   40944       |1      | 19            |   JM-SL-I4CLR,JM-FM-I5CLR
547629  |   40944       |1      | 12            |   2015-10-01

I have tried :

SELECT `value`, count(*) 
 FROM `lead_detail` 
 WHERE `field_number` = 5 
 GROUP BY `value` 
 HAVING count(*) > 1

Results

value          |    count(*)
---------------+------------------
mike@email.com |    2

Just not sure how to delete the rows?

3 Answers3

2

You can simply do it by using something similar to this. I've used this by myself and got the work done successfully..

DELETE t1 FROM lead_detail t1, lead_detail t2 
WHERE t1.id > t2.id AND t1.field_number = t2.field_number

And also you are free to expand/change the where part as necessary (only if required).

kuma DK
  • 1,694
  • 16
  • 33
1

This should return the lead_id you want to delete. I suggest running it first, storing result in temporary table and do some queries to make sure you don't lose anything. This will work even if the records are mixed (Duplicates are not consecutive inputs to the leads table)

select distinct(l1.lead_id)
from lead_detail l1
inner join lead_detail l2 on  l1.value = l2.value AND l1.field_number = 5 AND l2.field_number = 5 AND l1.id != l2.id LIMIT
18446744073709551610  OFFSET 1

18446744073709551610 is because of this Can't do offset without limit

Offset skips the first one (to make sure you keep one record)

Do a

Delete from lead_detail where lead_id in (above query) 

After you double check result

Community
  • 1
  • 1
Moussa Khalil
  • 625
  • 5
  • 12
0

You're just looking for duplicate emails so you probably don't need this. But what happens when later leads have more details than initial lead? This query is here only for reference if you want to delete only when all fields are duplicates.

DELETE FROM lead_detail
WHERE lead_id in
  (SELECT * FROM (SELECT lead_id FROM 
                    (SELECT lead_id,
                            GROUP_CONCAT(form ORDER BY form,field_number)as forms,
                            GROUP_CONCAT(field_number ORDER BY form,field_number) as field_numbers,
                            GROUP_CONCAT(value ORDER BY form,field_number) as `values`
                     FROM lead_detail
                     GROUP BY lead_id)l1
   WHERE EXISTS (SELECT 1 FROM 
                 (SELECT lead_id,
                            GROUP_CONCAT(form ORDER BY form,field_number)as forms,
                            GROUP_CONCAT(field_number ORDER BY form,field_number) as field_numbers,
                            GROUP_CONCAT(value ORDER BY form,field_number) as `values`
                     FROM lead_detail
                     GROUP BY lead_id)l2
                 WHERE l2.lead_id < l1.lead_id
                 AND l2.forms = l1.forms
                 AND l2.field_numbers = l1.field_numbers
                 AND l2.`values` = l1.`values`)
   )T
   )
Tin Tran
  • 5,995
  • 2
  • 15
  • 32