0

I found a not so nice method for checking if there a row already exists, and if it does exists, then it doesn't add it to avoid duplicates. Am i completely crazy to rely on this method or should i go old fashioned way where i check if it exists BEFORE trying to insert row in database?

The table is VERY simple :)

-ID [PK]
-Message
-Hashed_message [UNIQUE] (stored procedure, takes message and hashes it upon insert)

Now when i try to insert a new row i would say *`insert into .... message = xxx

Upon insertion mysql will create a hash on message automatically, but since it's an unique column, incase the hash already exists in database, it will simply throw an error, and no duplicates will exist ever... i hope.

The reason for using hashes, is simply to avoid checking duplicates by scanning every large message, instead i though a short hash would be easier to check for duplicates.

So is this method bad for avoiding duplicates? I mean i could before insert, manually create that hash of my message and check if that hash exists and THEN insert the message, but i would hope to avoid always trying to match the stored procedure function on PHP as well.

quick note: there is a similar thread about insert and then ignoring error on duplicate, but this one is related to how it is handled when a derived column(Stored procedure) is used to accomplish this

Shadow
  • 30,859
  • 10
  • 44
  • 56
  • 1
    Use `INSERT IGNORE INTO ...`. This converts unique constraint violation error to warning. If this query executes without error then the row exists after the query regardless it was inserted or ejected as a duplicate (but you may distinguish these situations by `SHOW WARNINGS` immediately after successful insertion query execution if needed). If this query produces the error nevertheless then this error reason is not duplicate violation. PS. I think that the code which produces an error during normal process flow is illogical. – Akina Apr 15 '21 at 07:59
  • 1
    Test then insert gives rise to a race condition where two processes might attempt the same insertion. INSERT and test for errors avoids that possibility. You could lock the entire table, but that's possibly a problem if the system is busy enough for a race condition to be a problem. – Tangentially Perpendicular Apr 15 '21 at 08:21
  • 1
    I'm sorry to point out, but your question is not in any way different from checking for a duplicate before a plain insert because at the end your stored procedure will execute a plain insert into a table. – Shadow Apr 15 '21 at 09:14
  • I'd consider using a key (even composite) to identify when a record is duplicated and use `INSERT ... ON DUPLICATE KEY id=id` that will avoid inserting the record if already existing without the need for error or warning checking – Lelio Faieta Apr 15 '21 at 09:16

1 Answers1

1

If the hashed message has to be unique, create a key on that column with the UNIQUE constrain: so there won't be two rows with the same hash.

Then, when you insert a new row modify your query with the following:

INSERT INTO table SET message='$message', hashed_message='$hashed_message'
ON DUPLICATE KEY id=id;

This will perform an insert if the hashed_message is unique. Otherwise will not do any update.

If you want to update something in case of duplicate your query will become:

INSERT INTO table SET message='$message', hashed_message='$hashed_message'
ON DUPLICATE KEY UPDATE message='$updated_message'

just to make an example.

Note that this method won't raise any exception in case of duplicate values: you need extra logic if you need to perform actions in your frontend in case of duplicates (i.e. message shown to the user).

More details here

Lelio Faieta
  • 5,913
  • 6
  • 34
  • 57