72

Say I have a simple table that has the following fields:

  1. ID: int, autoincremental (identity), primary key
  2. Name: varchar(50), unique, has unique index
  3. Tag: int

I never use the ID field for lookup, because my application is always based on working with the Name field.

I need to change the Tag value from time to time. I'm using the following trivial SQL code:

UPDATE Table SET Tag = XX WHERE Name = YY;

I wondered if anyone knows whether the above is always faster than:

DELETE FROM Table WHERE Name = YY;
INSERT INTO Table (Name, Tag) VALUES (YY, XX);

Again - I know that in the second example the ID is changed, but it does not matter for my application.

Brian Tompsett - 汤莱恩
  • 5,195
  • 62
  • 50
  • 120
Roee Adler
  • 31,157
  • 31
  • 99
  • 132
  • 14
    as a practice, I'd never recommend updating the primary key column. – KM. Aug 13 '09 at 12:35
  • @KM: I agree, this is a simplification of my real table, where all the lookup is done on a unique string field that is not the primary key. I do have a primary key int value that is completely irrelevant so I removed it from the example (it's automatically created and does not take part in the lookup at all) – Roee Adler Aug 13 '09 at 12:43
  • @KM: I updated the example to reflect the real table structure, just in case it makes a difference. – Roee Adler Aug 13 '09 at 12:46
  • 4
    UDPATE also has the benefit not to break any foreign key relations your table might have, as long as the key field being referenced doesn't change. If you DELETE + INSERT, some of your constraints might be violated and thus the DELETE might fail – marc_s Aug 13 '09 at 12:57
  • see my edit, and code examples. on my system the delete/insert took twice as long as the update only. – KM. Aug 13 '09 at 12:59
  • In the context of the quest 'is Update always faster' I think most would agree that it is virtually all the time, but there may be a rare and covoluted scenario in which it isn't. – Andrew Aug 13 '09 at 13:59
  • 1
    [Related Read](http://stackoverflow.com/a/5034183/550907) – Sisir May 01 '15 at 09:03

15 Answers15

77

A bit too late with this answer, but since I faced a similar question, I made a test with JMeter and a MySQL server on same machine, where I have used:

  1. A transaction Controller (generating parent sample) that contained two JDBC Requests: a Delete and an Insert statement
  2. A sepparate JDBC Request containing the Update statement.

After running the test for 500 loops, I have obtained the following results:

DEL + INSERT - Average: 62ms

Update - Average: 30ms

Results: Results

Halvor Holsten Strand
  • 18,479
  • 16
  • 70
  • 84
Dyptorden
  • 947
  • 7
  • 16
  • 1
    I would be curious to see the same stats with a large dataset and indexes. – Michael Silver Apr 03 '20 at 05:35
  • Hi Michael.That test was performed 5 years ago against the localhost (on a machine that doesn't exist anymore). To have some accuracy, the test should be repeated on identical tables and on indexed tables. Initial idea was just to grasp a flavor of the performance difference. If time allows it I may repeat it and update the post. I am glad that this post still sparks ppl's curiosity :) – Dyptorden Apr 03 '20 at 12:23
42

The bigger the table (number of and size of columns) the more expensive it becomes to delete and insert rather than update. Because you have to pay the price of UNDO and REDO. DELETEs consume more UNDO space than UPDATEs, and your REDO contains twice as many statements as are necessary.

Besides, it is plain wrong from a business point of view. Consider how much harder it would be to understand a notional audit trail on that table.


There are some scenarios involving bulk updates of all the rows in a table where it is faster to create a new table using CTAS from the old table (applying the update in the the projection of the SELECT clause), dropping the old table and renaming the new table. The side-effects are creating indexes, managing constraints and renewing privileges, but it is worth considering.

APC
  • 137,061
  • 19
  • 153
  • 266
12

One command on the same row should always be faster than two on that same row. So the UPDATE only would be better.

EDIT set up the table:

create table YourTable
(YourName  varchar(50)  primary key
,Tag int
)

insert into YourTable values ('first value',1)

run this, which takes 1 second on my system (sql server 2005):

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
UPDATE YourTable set YourName='new name'
while @x<10000
begin
    Set @x=@x+1
    update YourTable set YourName='new name' where YourName='new name'
    SET @y=@y+@@ROWCOUNT
end
print @y

run this, which took 2 seconds on my system:

SET NOCOUNT ON
declare @x int
declare @y int
select @x=0,@y=0
while @x<10000
begin
    Set @x=@x+1
    DELETE YourTable WHERE YourName='new name'
    insert into YourTable values ('new name',1)
    SET @y=@y+@@ROWCOUNT
end
print @y
KM.
  • 95,355
  • 33
  • 167
  • 203
  • 4
    Are you basing this on any concrete data? – Roee Adler Aug 13 '09 at 12:34
  • @Rax Olgud, how do you answer that? You haven't even said what database you are using. You asked a conceptual question, yet want concrete data. If you want actual data then you will need to write a wile loop (on your system), and update the row 1000 times, write another loop that will delete/insert it 1000 times. and see what is faster. – KM. Aug 13 '09 at 12:48
  • 2
    @Rax Olgud, there is some overhead in removing and creating an index value and checking any constraints. if you are just updating a data column it will avoid any of this overhead. – KM. Aug 13 '09 at 12:51
  • @Mohy66, the measurement is the time it takes to run, the totaling of the rowcount is to verify the amount of work that was done during the test. Thanks for the down vote. – KM. Dec 04 '18 at 15:31
8

I am afraid the body of your question is unrelated to title question.

If to answer the title:

In SQL, is UPDATE always faster than DELETE+INSERT?

then answer is NO!

Just google for

  • "Expensive direct update"* "sql server"
  • "deferred update"* "sql server"

Such update(s) result in more costly (more processing) realization of update through insert+update than direct insert+update. These are the cases when

  • one updates the field with unique (or primary) key or
  • when the new data does not fit (is bigger) in the pre-update row space allocated (or even maximum row size),resulting in fragmentation,
  • etc.

My fast (non-exhaustive) search, not pretending to be covering one, gave me [1], [2]

[1]
Update Operations
(Sybase® SQL Server Performance and Tuning Guide
Chapter 7: The SQL Server Query Optimizer)
http://www.lcard.ru/~nail/sybase/perf/11500.htm
[2]
UPDATE Statements May be Replicated as DELETE/INSERT Pairs
http://support.microsoft.com/kb/238254

5

Keep in mind the actual fragmentation that occurs when DELETE+INSERT is issued opposed to a correctly implemented UPDATE will make great difference by time.

Thats why, for instance, REPLACE INTO that MySQL implements is discouraged as opposed to using the INSERT INTO ... ON DUPLICATE KEY UPDATE ... syntax.

jishi
  • 22,747
  • 6
  • 45
  • 73
5

Just tried updating 43 fields on a table with 44 fields, the remaining field was the primary clustered key.

The update took 8 seconds.

A Delete + Insert is faster than the minimum time interval that the "Client Statistics" reports via SQL Management Studio.

Peter

MS SQL 2008

4

In your case, I believe the update will be faster.

Remember indexes!

You have defined a primary key, it will likely automatically become a clustered index (at least SQL Server does so). A cluster index means the records are physically laid on the disk according to the index. DELETE operation itself won't cause much trouble, even after one record goes away, the index stays correct. But when you INSERT a new record, the DB engine will have to put this record in the correct location which under circumstances will cause some "reshuffling" of the old records to "make place" for a new one. There where it will slow down the operation.

An index (especially clustered) works best if the values are ever increasing, so the new records just get appended to the tail. Maybe you can add an extra INT IDENTITY column to become a clustered index, this will simplify insert operations.

  • 1
    The 'reshuffling' there would be the page split. – Andrew Aug 13 '09 at 12:51
  • In this example, the new record would go at the end of the table (based on the PK), because the user is not specifying the PK. If the "name" index were clustered, then that would be a problem, but it's unlikely to be clustered. – Jack R-G Jan 03 '20 at 14:51
4

The question of speed is irrelevant without a specific speed problem.

If you are writing SQL code to make a change to an existing row, you UPDATE it. Anything else is incorrect.

If you're going to break the rules of how code should work, then you'd better have a damn good, quantified reason for it, and not a vague idea of "This way is faster", when you don't have any idea what "faster" is.

Andy Lester
  • 81,480
  • 12
  • 93
  • 144
4

What if you have a few million rows. Each row starts with one piece of data, perhaps a client name. As you collect data for clients, their entries must be updated. Now, let's assume that the collection of client data is distributed across numerous other machines from which it is later collected and put into the database. If each client has unique information, then you would not be able to perform a bulk update; i.e., there is no where-clause criteria for you to use to update multiple clients in one shot. On the other hand, you could perform bulk inserts. So, the question might be better posed as follows: Is it better to perform millions of single updates, or is it better to compile them into large bulk deletes and inserts. In other words, instead of "update [table] set field=data where clientid=123" a milltion times, you do 'delete from [table] where clientid in ([all clients to be updated]);insert into [table] values (data for client1), (data for client2), etc'

Is either choice better than the other, or are you screwed both ways?

brian
  • 41
  • 1
3

Delete + Insert is almost always faster because an Update has way more steps involved.

Update:

  1. Look for the row using PK.
  2. Read the row from disk.
  3. Check for which values have changed
  4. Raise the onUpdate Trigger with populated :NEW and :OLD variables
  5. Write New variables to disk (The entire row)

    (This repeats for every row you're updating)

Delete + Insert:

  1. Mark rows as deleted (Only in the PK).
  2. Insert new rows at the end of the table.
  3. Update PK Index with locations of new records.

    (This doesn't repeat, all can be perfomed in a single block of operation).

Using Insert + Delete will fragment your File System, but not that fast. Doing a lazy optimization on the background will allways free unused blocks and pack the table altogether.

jri
  • 127
  • 2
  • 3
  • 15
    This answer over-simplifies the operations and misses out a lot of steps for the main commercial RDBMs models - deleting a row by just altering the PK (and nothing else) is not how the main commercial RDBMs work. Your information on triggers is incorrect and one-sided. For a start, the delete / insert could/would also fire triggers - but you fail to include those. Unless you specify a per row trigger, it will also fire just once for the update and twice for the delete / insert. – Andrew Oct 18 '15 at 20:15
2

Obviously, the answer varies based on what database you are using, but UPDATE can always be implemented faster than DELETE+INSERT. Since in-memory ops are mostly trivial anyways, given a hard-drive based database, an UPDATE can change a database field in-place on the hdd, while a delete would remove a row (leaving an empty space), and insert a new row, perhaps to the end of the table (again, it's all in the implementation).

The other, minor, issue is that when you UPDATE a single variable in a single row, the other columns in that row remain the same. If you DELETE and then do an INSERT, you run the risk of forgetting about other columns and consequently leaving them behind (in which case you would have to do a SELECT before your DELETE to temporarily store your other columns before writing them back with INSERT).

erjiang
  • 40,940
  • 10
  • 57
  • 94
  • 1
    I'm not sure I agree with you regarding the first point, especially when using variable length string types. Updating those may indeed require HD writes in "new places". – Roee Adler Aug 13 '09 at 12:40
1

It depends on the product. A product could be implemented that (under the covers) converts all UPDATEs into a (transactionally wrapped) DELETE and INSERT. Provided the results are consistent with the UPDATE semantics.

I'm not saying I'm aware of any product that does this, but it's perfectly legal.

Damien_The_Unbeliever
  • 220,246
  • 21
  • 302
  • 402
  • ... perfectly legal, as long as the Foreign-Key constraint-checking is deferred until after the insert, which may not be legal. – Mr. Shiny and New 安宇 Aug 13 '09 at 14:21
  • I am not sure but I have heard that SQL Server does a DELETE+INSERT for UPDATE, internally. If that is the case, will it make any difference in case of SQL Server? – Faiz Oct 15 '09 at 05:11
  • @Faiz - as with everything, the only way to be sure is to test with your data, in your environment. The underlying cost of these operations is unlikely to be your bottleneck - ever. With SQL Server, if you've got a trigger, it certainly *resembles* a delete/insert, but whether that's what the system actually does, who needs to know :-) – Damien_The_Unbeliever Oct 15 '09 at 07:03
1

Every write to the database has lots of potential side effects.

Delete: a row must be removed, indexes updated, foreign keys checked and possibly cascade-deleted, etc. Insert: a row must be allocated - this might be in place of a deleted row, might not be; indexes must be updated, foreign keys checked, etc. Update: one or more values must be updated; perhaps the row's data no longer fits into that block of the database so more space must be allocated, which may cascade into multiple blocks being re-written, or lead to fragmented blocks; if the value has foreign key constraints they must be checked, etc.

For a very small number of columns or if the whole row is updated Delete+insert might be faster, but the FK constraint problem is a big one. Sure, maybe you have no FK constraints now, but will that always be true? And if you have a trigger it's easier to write code that handles updates if the update operation is truly an update.

Another issue to think about is that sometimes inserting and deleting hold different locks than updating. The DB might lock the entire table while you are inserting or deleting, as opposed to just locking a single record while you are updating that record.

In the end, I'd suggest just updating a record if you mean to update it. Then check your DB's performance statistics and the statistics for that table to see if there are performance improvements to be made. Anything else is premature.

An example from the ecommerce system I work on: We were storing credit-card transaction data in the database in a two-step approach: first, write a partial transaction to indicate that we've started the process. Then, when the authorization data is returned from the bank update the record. We COULD have deleted then re-inserted the record but instead we just used update. Our DBA told us that the table was fragmented because the DB was only allocating a small amount of space for each row, and the update caused block-chaining since it added a lot of data. However, rather than switch to DELETE+INSERT we just tuned the database to always allocate the whole row, this means the update could use the pre-allocated empty space with no problems. No code change required, and the code remains simple and easy to understand.

Mr. Shiny and New 安宇
  • 13,596
  • 6
  • 41
  • 64
1

Large number of individual updates vs bulk delete/bulk insert is my scenario.I have historical sales data for multiple customers going back years. Until I get verified data (15th of the following month), I will adjust sales numbers every day to reflect the current state as obtained from another source (this means overwriting at most 45 days of sales each day for each customer). There may be no changes, or there may be a few changes. I can either code the logic to find the differences and update/delete/insert the affected records or I can just blow away yesterday's numbers and insert today's numbers. Clearly this latter approach is simpler, but if it's going to kill the table's performance due to churn, then it's worth it to write the extra logic to identify the handful (or none) of records that changed and only update/delete/insert those.

So, I'm replacing the records, and there may be some relationship between the old records and the new records, but in general I don't necessarily want to match the old data with the new data (that would be an extra step and would result in deletions, updates, and inserts). Also, relatively few fields would be changed (at most 7 out of 20 or 2 out of 15).

The records that are likely to be retrieved together will have been inserted at the same time and therefore should be physically close to each other. Does that make up for the performance loss due to churn with that approach, and is it better than the undo/redo cost of all those individual record updates?

Jack R-G
  • 1,698
  • 2
  • 18
  • 24
0

In specific cases, Delete+Insert would save you time. I have a table that has 30000 odd rows and there is a daily update/insert of these records using a data file. The upload process generates 95% of update statements as the records are already there and 5% of inserts for ones that do not exist. Alternatively, uploading the data file records into a temp table, deletion of the destination table for records in the temp table followed by insertion of the same from the temp table has shown 50% gain in time.

shan
  • 11