0

Ok.. So I have looked around at numerous topics on this subject and I have tried many queries to get the result I want, so as you've probably guessed I still haven't found the solution.

I have a client who regularly uploads information to a MySQL database relating to invoices. There are 3 columns in the table:

|------
|Column|Type|Null|Default
|------
|//**invoice**//|int(11)|No|0
|//**barcode**//|int(11)|No|
|//**invoice_date**//|datetime|No|

So you can see there is invoice, barcode and invoice_date. Each barcode is UNIQUE but you can have multiple barcodes associated to an invoice ID.

Now you're probably thinking: "Why didn't you just add a UNIQUE index on the barcode and invoice_date columns?" - I do have indexes on these columns.. Now you see, this is where the massive ball ache comes in to play. They use a scanner to scan the barcodes and this is then put into a CSV. Sometimes this messes up, so there will be a record that is EXACTLY the same but the invoice_date reference will be a couple of seconds or minutes different so MySQL will not interpret this as a duplicate even though it is the same.

Example of duped records:

|24815|86632|2008-08-21 10:22:50
|24899|86632|2008-09-04 17:12:30
|55555|86632|2008-08-21 10:34:41

So what I need to do is:

DELETE all records EXCEPT the record with the OLDEST invoice_date where the barcode is the same.

So from the above data set I would need to keep:

|24815|86632|2008-08-21 10:22:50 as thats the oldest record.

I've tried many queries.

E.g

DELETE I1 FROM v3_invoices_test I1
LEFT JOIN
(
    SELECT MIN(invoice_date) AS OLDESTRECORD, barcode
    FROM   v3_invoices_test
) I2 
ON I1.barcode = I2.barcode
WHERE OLDESTRECORD > I1.invoice_date

DELETE FROM v3_invoices_test
WHERE (barcode, invoice_date) IN (
    SELECT 
    barcode, 
    invoice_date
    FROM 
        v3_invoices_test I1
    WHERE 
    EXIST (
        SELECT *
        FROM v3_invoices_test I2
        WHERE I1.barcode = I2.barcode
        AND I1.invoice_date < I2.invoice_date
    )
)

If anyone could help it would be deeply appreciated!

Thanks

2 Answers2

1

Here is your fiddle: http://sqlfiddle.com/#!2/29375b/1

So your query would be:

DELETE FROM v3_invoices_test WHERE invoice NOT IN (
  SELECT invoice FROM (
    SELECT invoice FROM v3_invoices_test JOIN (
      SELECT barcode, MIN(invoice_date) m FROM v3_invoices_test GROUP BY barcode
    ) temp ON t.barcode = temp.barcode AND t.invoice_date = temp.m
  ) a
);

Sadly, the first subquery is needed because of #1093 (https://stackoverflow.com/a/14302701/1767861). Basically, the query retrieves oldest dates grouped by the barcode, and apply a deletion on all the other ids.

Community
  • 1
  • 1
Thomas Ruiz
  • 3,431
  • 2
  • 17
  • 32
0

QUERY:

DELETE t1 
FROM v3_invoices_test AS t1 
JOIN v3_invoices_test as t2 
WHERE t1.id > t2.id 
AND t1.column_name1 = t2.column_name1;

Here you need id column which is just like auto_increment column and column_name in above query means based on which column you deleting the data you can give (t1.column_name1 = t2.column_name1) and if you deleting data based more than one column you can just add (and t1.column_name2 = t2.column_name2) this to your query

and finally you need to delete the data based on oldest or latest you just change < or > in your query for oldest you can give like this (t1.id > t2.id), for newest you can give like this (t1.id < t2.id)

Airn5475
  • 2,116
  • 25
  • 43