Mysql 'insert if not exists', preventing 'overlap'

Question

My general question seems to be answered by https://stackoverflow.com/a/3025332/3650835, but I do not fully understand it after reading the MySql docs and am wondering if my solution will work as I expect, and also wonder if LIMIT 1 is needed.

Goal: to ensure that, for a given user_id, start and end never "overlap". As an example:

test_table

user_id   start   end
4         1       5
4         6       13
4         11      17     --> NOT allowed, bc 11 <= 13
2         1       9      --> allowed, user_id is different

My current solution

/* this should not insert anything, as it would cause an "overlap" of start
and end, based on row 2 having end = 13 */

INSERT INTO `test_table` (user_id, start, end) 
SELECT '4', '11', '17' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '4' AND end >= '11')
LIMIT 1;

Does the WHERE NOT EXISTS section mean "only do this insert...if this following select returns nothing"?

Also, there was the following comment in the linked solution, but I do not understand why based on the MySql docs this would be true. If true, I could remove Limit 1 from my solution:

If you use "from dual" on line 2 instead of "from table", then you don't need the "limit 1" clause

Thanks for your time.

Edit: here is all the sql for testing/setup:

CREATE TABLE `test_table`(
    `id` INT PRIMARY KEY AUTO_INCREMENT,
    `user_id` INT,
    `start` INT,
    `end` INT
);


INSERT INTO `test_table` (user_id, start, end) 
SELECT '4', '1', '5' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '4' AND end >= '1')
LIMIT 1;

INSERT INTO `test_table` (user_id, start, end) 
SELECT '2', '1', '9' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '2' AND end >= '1')
LIMIT 1;

INSERT INTO `test_table` (user_id, start, end) 
SELECT '4', '6', '13' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '4' AND end >= '6')
LIMIT 1;

/* this should not insert anything, as it would cause an "overlap" of start and end */
INSERT INTO `test_table` (user_id, start, end) 
SELECT '4', '11', '17' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '4' AND end >= '11')
LIMIT 1;

What would you want to happen if you tried to insert something like (4, 14, 14) after (4, 15, 17)? Also, no, you should not need the limit; `SELECT x, y, z FROM DUAL` is kind of shorthand for `(SELECT x, y, z) AS notARealTable`. — Uueerdo, Feb 19 '19 at 23:37
Actually, I made a stealth edit. What I was trying to ask was whether inbetweens should be addable after the fact. If inbetweens are not allowed, those queries look fine other than the necessary LIMIT clause. — Uueerdo, Feb 19 '19 at 23:40
@Uueerdo yes I see now. Either case is fine; we'll use server-side code to ensure that case is not allowed. From my understanding of my current sql solution, it would not be allowed (because 17 is >= 14) — KayakinKoder, Feb 19 '19 at 23:42
I mean in the sense of, starting from empty, should these inserts be allowed in the order I list them: (4, 1, 2), (4, 5, 6), (4, 3, 4)? — Uueerdo, Feb 19 '19 at 23:59
@Uueerdo no, the third insert would fail; the third insert's values need to be greater than the second insert's values — KayakinKoder, Feb 20 '19 at 00:00

score 1 · Answer 1 · answered Feb 19 '19 at 23:25

NOT EXISTS means that get results for the inner query and if it returns any rows dont include that related row in the main query. But, your subquery does not have any relation with your main query, which makes me to think that the query may not be generating correct results.

select * from tbl1 
where not exists (
   select 1 from tbl2 where tbl1.id = tbl2.id
)

the above query makes lot more sense. It means that for every record in tbl1 check tbl2 and if any result is found dont include it in the query result.

score 1 · Answer 2 · answered Feb 19 '19 at 23:46

1

It might be easier to understand what your query is doing if it is rewritten like this:

INSERT INTO `test_table` (user_id, start, end) 
SELECT user_id, start, end 
FROM ( SELECT 4 AS `user_id`, 6 AS `start`, 13 AS `end`) AS candidate
WHERE NOT EXISTS (
   SELECT * 
   FROM `test_table` AS t 
   WHERE t.user_id = candidate.user_id AND t.end >= candidate.`end`
)
;

Also, note I removed the single quotes around the numbers; it may or may not be a problem in this case, but in some scenarios that could have resulted in some hard to find bugs where 2 > 11 (if MySQL decided to cast the t.end to a char type to compare with candidate.end).

answered Feb 19 '19 at 23:46

Uueerdo

15,262
1
13
19

Thanks, my sql isn't advanced enough to understand this (particularly `as candidate`) but I will do some reading on it – KayakinKoder Feb 20 '19 at 00:01
"candidate" in this case is just an alias, it lets you treat the value list like a table/subquery and reference it in the WHERE's subquery (rather than copying your static values into it). – Uueerdo Feb 20 '19 at 00:03

Nick · Answer 3 · 2019-02-20T00:09:34.500

1

A select from DUAL will only return ever return one row, so the LIMIT 1 is not required. However if you use your table name, your query will return either as many rows as are in the table, or none, dependent on whether the EXISTS expression returns true or false. So in that case you will need the LIMIT 1.

Your interpretation of what the WHERE NOT EXISTS does is correct.

If (start,end) pairs are only ever inserted in order, your existing test on end is sufficient. If however they might go backwards e.g. (4, 1, 2), (4, 5, 6), (4, 3, 4), then you should change your WHERE clause in the subquery to also test the start value, e.g. the last query should be written as

INSERT INTO `test_table` (user_id, start, end) 
SELECT '4', '11', '17' FROM DUAL 
WHERE NOT EXISTS (SELECT * FROM `test_table` 
      WHERE user_id = '4' AND 
            (start <= 11 AND end >= 11 OR start <= 17 AND end >= 17))

I made a small demo on dbfiddle to show how these work.

edited Feb 20 '19 at 00:09

answered Feb 19 '19 at 23:46

Nick

118,076
20
42
73

Thanks for the detailed explanation, but I think the `start <=` condition would cause failures. For example try values start = 14 and end = 17; these should be inserted, but are not – KayakinKoder Feb 19 '19 at 23:57
@KayakinKoder you are of course correct, the query should be modified to check that the times of any existing entry overlap either the start or end time of the new entry. See my edit – Nick Feb 20 '19 at 00:03
@Nick they are looking to enforce ascending ranges, so only need to check for an existing `end` greater than the incoming `start`. – Uueerdo Feb 20 '19 at 00:05
@Uueerdo I didn't see the exchange you had been having with OP in the comments. You are correct; I will edit. – Nick Feb 20 '19 at 00:07

fifonik · Accepted Answer · 2019-02-21T20:50:17.170

1

I'd use triggers instead:

-- This is your original table create statement
DROP TABLE IF EXISTS `test_table`;
CREATE TABLE `test_table`(
    `id` INT PRIMARY KEY AUTO_INCREMENT,
    `user_id` INT,
    `start` INT,
    `end` INT
);

DELIMITER //
CREATE TRIGGER `TRG_test_table_before_insert` BEFORE INSERT ON `test_table` FOR EACH ROW BEGIN
    SELECT
        COUNT(*) INTO @cnt
    FROM
        test_table
    WHERE
            `user_id` = NEW.user_id
        AND `start` <= NEW.`end`
        AND `end` >= NEW.`start`
    ;

    IF(@cnt > 0) THEN
        SET @msg = CONCAT('TrgErr: overlapping, user_id = ', NEW.user_id, ', start = ', NEW.`start`, ', end = ', NEW.`end`);
        SIGNAL SQLSTATE '45000' SET message_text = @msg;
    END IF;
END//
DELIMITER ;

Then you will be able to use normal insert statements:

INSERT INTO `test_table` (user_id, `start`, `end`) VALUES ('4', '1', '5');
INSERT INTO `test_table` (user_id, `start`, `end`) VALUES ('2', '1', '9');
INSERT INTO `test_table` (user_id, `start`, `end`) VALUES ('4', '6', '13');
INSERT INTO `test_table` (user_id, `start`, `end`) VALUES ('4', '11', '17'); -- this one will not be inserted

In addition, you can implement the similar settings for updates.

P.S. You should check overlapping logic in my code as I do not know if start = end should be allowed or not. P.P.S. Index (user_id, begin) will also help

edited Feb 21 '19 at 20:50

answered Feb 20 '19 at 01:24

fifonik

1,433
1
9
16

Thank you, I like the simpler insert statements – KayakinKoder Feb 20 '19 at 15:41
You can also simplify the in-trigger query as you do not really need to count rows. Something like `SELECT 1 into @exists ... LIMIT 1` might be easier for server. Unfortunately, more details about your data needed to be sure. – fifonik Feb 20 '19 at 21:16
Could you check, I'm getting syntax errors; the one at 7 I know how to fix but the other I'm not sure about. Also yes we don't need to count rows; if there is one single occurrence of a row that matches the conditions in the `where`, that means we should not allow the insert – KayakinKoder Feb 21 '19 at 14:31
I copy/pasted it from my system (I tested it first). I think the issue is -- you have to change delimiter while creating the trigger (this is quite common when you are creating SP/triggers as there are ';' in code that should not be understood by server as separator straight away). I updated my answer to show that. – fifonik Feb 21 '19 at 20:44
Sure COUNT(\*) will do the trick. I meant that query without COUNT(\*) and with LIMIT 1 may have internal optimizations as server will know that it would be just a single row in result and it will not need to continue after first match. – fifonik Feb 21 '19 at 20:46

Mysql 'insert if not exists', preventing 'overlap'

4 Answers4