Ok.. So I have looked around at numerous topics on this subject and I have tried many queries to get the result I want, so as you've probably guessed I still haven't found the solution.
I have a client who regularly uploads information to a MySQL database relating to invoices. There are 3 columns in the table:
|------
|Column|Type|Null|Default
|------
|//**invoice**//|int(11)|No|0
|//**barcode**//|int(11)|No|
|//**invoice_date**//|datetime|No|
So you can see there is invoice, barcode and invoice_date. Each barcode is UNIQUE but you can have multiple barcodes associated to an invoice ID.
Now you're probably thinking: "Why didn't you just add a UNIQUE index on the barcode and invoice_date columns?" - I do have indexes on these columns.. Now you see, this is where the massive ball ache comes in to play. They use a scanner to scan the barcodes and this is then put into a CSV. Sometimes this messes up, so there will be a record that is EXACTLY the same but the invoice_date reference will be a couple of seconds or minutes different so MySQL will not interpret this as a duplicate even though it is the same.
Example of duped records:
|24815|86632|2008-08-21 10:22:50
|24899|86632|2008-09-04 17:12:30
|55555|86632|2008-08-21 10:34:41
So what I need to do is:
DELETE all records EXCEPT the record with the OLDEST invoice_date where the barcode is the same.
So from the above data set I would need to keep:
|24815|86632|2008-08-21 10:22:50
as thats the oldest record.
I've tried many queries.
E.g
DELETE I1 FROM v3_invoices_test I1
LEFT JOIN
(
SELECT MIN(invoice_date) AS OLDESTRECORD, barcode
FROM v3_invoices_test
) I2
ON I1.barcode = I2.barcode
WHERE OLDESTRECORD > I1.invoice_date
DELETE FROM v3_invoices_test
WHERE (barcode, invoice_date) IN (
SELECT
barcode,
invoice_date
FROM
v3_invoices_test I1
WHERE
EXIST (
SELECT *
FROM v3_invoices_test I2
WHERE I1.barcode = I2.barcode
AND I1.invoice_date < I2.invoice_date
)
)
If anyone could help it would be deeply appreciated!
Thanks