3

I currently have PHP code that handles the logic for this because I do not know how to handle it in SQL. I want to create a stored procedure that will delete all the rows except for the 5 newest for a given config_id. IE config_id = 5 gets passed to the SP so it knows which config_id it is looking to clean up.

CREATE TABLE  `TAA`.`RunHistory` (
  `id` int(11) NOT NULL auto_increment,
  `start_time` datetime default NULL,
  `stop_time` datetime default NULL,
  `success_lines` int(11) default NULL,
  `error_lines` int(11) default NULL,
  `config_id` int(11) NOT NULL,
  `file_id` int(11) NOT NULL,
  `notes` text NOT NULL,
  `log_file` longblob,
  `save` tinyint(1) NOT NULL default '0',
  PRIMARY KEY  (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=128 DEFAULT CHARSET=utf8;

Newest will be determined by start_time, if a stop_time is null but NOT the newest it should be deleted (stop_time can be null if a run was unceremoniously killed).

ashurexm
  • 5,921
  • 3
  • 42
  • 69
  • How should "newest" be determined? Based on start_time? Based on stop_time? Based on id is not fool proof but that is another choice. – Thomas Apr 28 '10 at 16:26
  • @Thomas, he said based on stop_time. Though it's not clear what should happen to records that have a null in that field. – FogleBird Apr 28 '10 at 16:29
  • I apologize- I was interrupted ten times during the initial post and didn't re-read. It was an awfully posed question at first. – ashurexm Apr 28 '10 at 16:44
  • @Thomas: Initially it said stop_time, but I did mean start_time because stop_time CAN be null under very normal usage. – ashurexm Apr 28 '10 at 16:45
  • possible duplicate of [SQL query: Delete all records from the table except latest N?](http://stackoverflow.com/questions/578867/sql-query-delete-all-records-from-the-table-except-latest-n) – e-sushi Nov 23 '13 at 03:56

4 Answers4

12

From SQL query: Delete all records from the table except latest N?:

DELETE FROM `runHistory`
WHERE id NOT IN (
  SELECT id
  FROM (
    SELECT id
    FROM `runHistory`
    ORDER BY start_time DESC
    LIMIT 5 
  ) foo
);
Community
  • 1
  • 1
Luis Melgratti
  • 11,101
  • 2
  • 27
  • 32
  • @Luis Melgratti: If I have a large set of IDs, will the `IN` statement be slower than other ways of doing this? This also shows that a null stop_time is irrelevant for the purposes of cleaning up the list. – ashurexm Apr 28 '10 at 17:00
  • @Luis Melgratti: When I run that query I get the following error: `This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'` MySQL version = 5.0.45 – ashurexm Apr 30 '10 at 00:43
  • @Theo, my way is much simpler and does the same thing. – Delan Azabani May 10 '10 at 02:13
  • 1
    @Delan, then how come you've removed your answer? – Theo May 10 '10 at 11:26
  • @manyxcxi What about using a temporary table then? Since you're planning on a stored procedure you could create the temp table, insert the data into it (ordered and limited) and then delete from the `runHistory` table without having to worry about the LIMIT & IN error. Then delete the temporary table afterwards. – Blair McMillan May 10 '10 at 15:54
  • @Theo; I realise that my answer was wrong now as I hadn't read the question correctly previously. Thank you for your understanding. – Delan Azabani May 11 '10 at 01:54
1

Here's a procedure I tested on MySQL 5.1.46, but it uses no subqueries so you won't get the error about no support for LIMIT in a subquery.

CREATE PROCEDURE DeleteBut5(IN c INT) BEGIN
  DECLARE i INT;
  DECLARE s DATETIME;

  SELECT id, stop_time INTO i, s
  FROM RunHistory WHERE config_id = c
  ORDER BY stop_time DESC, id DESC
  LIMIT 4, 1;

  DELETE FROM RunHistory WHERE stop_time < s OR stop_time = s AND id < i;
END

I recommend you create this covering index:

CREATE INDEX cov ON RunHistory (config_id, stop_time, id);
Bill Karwin
  • 462,430
  • 80
  • 609
  • 762
0

https://stackoverflow.com/a/8303440/2576076 is good solution. If your table has large number of rows, it's even more better.

Community
  • 1
  • 1
vr3C
  • 1,606
  • 18
  • 15
0
begin;
declare v_start_time datetime;
declare v_id int;
#Find the id of the newest run
select id into v_id from RunHistory where start_time = (select max(start_time) from RunHistory);
#delete null stop times except for the newest run
delete from RunHistory where stop_time is null and id != v_id;
#first row is 0... skip 0-4, show 5
select start_time into v_start_time from RunHistory order by stop_time desc limit 4,1;
delete from RunHistory where start_time < v_start_time;
end;

There you go. I suggest indexing start_time. Stop_time may or may not be worth indexing. It's probably not. You can optimize that delete statement by changing it to the following, since we'll delete anything past the first five anyway:

delete from RunHistory where stop_time is null and id != v_id order by start_time desc limit 5;
Jeff Ferland
  • 16,762
  • 5
  • 42
  • 72