1

So we have a database. Every so often we'll add a new data set and the oldest dataset should be removed.

Say the DB is Early June: 18 records, Late June: 15 records, Aug: 23 records. When we add the next records we want to remove the Early June records. However adding new data isn't totally regular. My second thought is to have an "oldness" field for each record and before adding a new data set, increment all of the "oldness" fields by 1, then removing all the data that has "oldness" of 3.

I'm worried that this is not the optimal way to do this, but I'm not sure it matters since it's a fairly small database that is updated infrequently. That said, I'm certainly open to suggestions for a better algorithm to accomplish this.

aslum
  • 10,651
  • 15
  • 45
  • 67

3 Answers3

2

I'm assuming your data is stored such that it has a timestamp (date) column for each report, and that you always want to remove data that is x (in this case, 3) months old. If so, you might think about using mysql's DATEDIFF function.

For example, it might look something like this:

DELETE from table1 WHERE datediff(CURRENT_DATE(), datecol) > 89

http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_datediff

user937146
  • 238
  • 1
  • 7
  • The problem is that I don't know how old the oldest data will be. Months could be skipped, or there could be 4 updates in a month. – aslum Sep 12 '11 at 18:50
  • Hmm..you could grab the earliest date of an existing record by using the `MIN` function, and specify the last day of that month using `LAST_DAY`. That would then get you midnight on the last day of the oldest used month, so you'd need to add a day to it so you delete records entered on that last day. Something along the lines of this perhaps? `SET @olddate = (SELECT (LAST_DAY(MIN(datecol) + INTERVAL 1 DAY) from table1); DELETE from table1 WHERE datecol < @olddate` – user937146 Sep 12 '11 at 19:15
  • 1
    Again, there can be multiple updates in the same month, but I think `MIN` is the way to go. Find the MIN date, add an hour (just in case the records aren't added exactly simultaneously) and delete older than that. – aslum Sep 12 '11 at 19:22
  • I think I've got something working (thanks to your comment), so if you want to edit your answer (or create a new answer, either way) using `MIN` I'll mark it as accepted answer. – aslum Sep 12 '11 at 19:40
  • 1
    Since my assistance in your discovery of the answer was purely coincidental, it'd probably be more useful if you posted your solution and marked it as the accepted answer. :) – user937146 Sep 12 '11 at 19:53
0

Is it always by month?

UPDATE table SET status = "oldness" WHERE date_inserted <= NOW() - INTERVAL 3 MONTH
Ortiga
  • 7,687
  • 5
  • 37
  • 64
  • Sadly no. There can be multiple updates in a month, or months could be skipped entirely. – aslum Sep 12 '11 at 18:55
  • No problem. This example i posted will run in all rows 3 months old. I mean, there may be a group of data that can start in the middle of a month, and end in the middle of next month? If yes, you may have to add a column to distinguish a group of data. If no, my solution will work. – Ortiga Sep 12 '11 at 20:01
0

With a little help from user937146 I think I've got the solution. Basically

$query = "SELECT MIN(timestamp) FROM new_items WHERE type = 'Book' ";

Then it's just a matter of taking that timestamp, adding an hour (since the data can take more than a second to upload, probably one minute would be sufficient) and then deleting everything from the database older than that timestamp.

Community
  • 1
  • 1
aslum
  • 10,651
  • 15
  • 45
  • 67