25

Here's the problem I am trying to solve: I have recently completed a data layer re-design that allows me to load-balance my database across multiple shards. In order to keep shards balanced, I need to be able to migrate data from one shard to another, which involves copying from shard A to shard B, and then deleting the records from shard A. But I have several tables that are very big, and have many foreign keys pointed to them, so deleting a single record from the table can take more than one second.

In some cases I need to delete millions of records from the tables, and it just takes too long to be practical.

Disabling foreign keys is not an option. Deleting large batches of rows is also not an option because this is a production application and large deletes lock too many resources, causing failures. I'm using Sql Server, and I know about partitioned tables, but the restrictions on partitioning (and the license fees for enterprise edition) are so unrealistic that they are not possible.

When I began working on this problem I thought the hard part would be writing the algorithm that figures out how to delete rows from the leaf level up to the top of the data model, so that no foreign key constraints get violated along the way. But solving that problem did me no good since it takes weeks to delete records that need to disappear overnight.

I already built in a way to mark data as virtually deleted, so as far as the application is concerned, the data is gone, but I'm still dealing with large data files, large backups, and slower queries because of the sheer size of the tables.

Any ideas? I have already read older related posts here and found nothing that would help.

Eric Z Beard
  • 35,488
  • 25
  • 97
  • 144
  • Why can you not temporarily disable foreign keys? Can you not run this during a maintenance period and disable access to the system? – cjk Jul 21 '09 at 12:29
  • Would it be an option to remove the need to do this copy process altogether? By ensuring via some algorithm that records are divided equally among the shards from the word go, rather than having to move records around once they've been created to balance them up. – AdaTheDev Jul 21 '09 at 12:35
  • @AdaTheDev, for new shards it's not such a problem but my initial deployment was a backup-restore, and on each copy, mark half of the data as virtually deleted. So those two shards are huge. @Mitch, the point of shards is to use cheaper equipment, so I won't spend money to solve the problem. @ck, I'm not sure the FKs are the real problem. I think the biggest percentage of time spent is the clustered index delete. – Eric Z Beard Jul 21 '09 at 13:23
  • Can you post an execution plan here? To get one, press Control-L when your query is loaded in SSMS, and then take a screenshot (assuming it all fits on one screen). If it's too big, see if you can get a text-based copy of the plan. – SqlRyan Jul 24 '09 at 14:34
  • I don't want to post the actual plan but I can describe it easily enough: a clustered index delete at far right with 66%, then under that 29 seeks for the FKs at approximately 1% each. – Eric Z Beard Jul 25 '09 at 11:40

8 Answers8

30

Please see: Optimizing Delete on SQL Server

This MS support article might be of interest: How to resolve blocking problems that are caused by lock escalation in SQL Server:

Break up large batch operations into several smaller operations. For example, suppose you ran the following query to remove several hundred thousand old records from an audit table, and then you found that it caused a lock escalation that blocked other users:

DELETE FROM LogMessages WHERE LogDate < '2/1/2002'    

By removing these records a few hundred at a time, you can dramatically reduce the number of locks that accumulate per transaction and prevent lock escalation. For example:

SET ROWCOUNT 500
delete_more:
     DELETE FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more
SET ROWCOUNT 0

Reduce the query's lock footprint by making the query as efficient as possible. Large scans or large numbers of Bookmark Lookups may increase the chance of lock escalation; additionally, it increases the chance of deadlocks, and generally adversely affects concurrency and performance.

Community
  • 1
  • 1
Mitch Wheat
  • 280,588
  • 41
  • 444
  • 526
  • 1
    That's just what I was going to suggest. – HLGEM Jul 21 '09 at 13:44
  • 1
    @crokusek, to add to your comment, it won't affect this particular batch since `SET ROWCOUNT 500` makes it 500 for this batch. Any batches that come after may be affected if they expect `ROWCOUNT` (different from `@@ROWCOUNT`) to be something else. – Dennis T --Reinstate Monica-- Feb 09 '15 at 22:30
  • This is simply amazing. Never thought to look at the problem this way. All I can say is wow! – pim Mar 01 '17 at 15:14
18
delete_more:
     DELETE TOP(500) FROM LogMessages WHERE LogDate < '2/1/2002'
IF @@ROWCOUNT > 0 GOTO delete_more

You could achieve the same result using SET ROWCOUNT as suggested by Mitch but according to MSDN it won't be supported for DELETE and some other operations in future versions of SQL Server:

Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax. For more information, see TOP (Transact-SQL).

Makyen
  • 27,758
  • 11
  • 68
  • 106
foobarcode
  • 2,061
  • 20
  • 23
  • 1
    And RowCount carries scoping implications http://stackoverflow.com/questions/5383761/scope-of-set-rowcount-in-sql. For instance, how would you know it was zero to begin with for instance when it is restored? – crokusek Nov 19 '13 at 01:49
2

You could create new files, copy all but the "deleted" rows, then swap the names on the tables. Finally, drop the old tables. If you're deleting a large percentage of the records, then this may actually be faster.

seanyboy
  • 5,465
  • 7
  • 40
  • 56
  • That might be something I could try, but we're talking about tables with tens of millions of records, several Gigs for the clustered index. It would have to be possible inside of a normal maintenance window. – Eric Z Beard Jul 21 '09 at 13:27
1

Another suggestion is to rename the table and add a status column. When status = 1 (deleted), then you won't want it to show. So you then create a view with the same name as the orginal table which selects from the table when status is null or = 0 (depending on how you implement it). The deletion appears immediate to the user and a background job can run every fifteen minutes deleting records that runs without anyone other than the dbas being aaware of it.

HLGEM
  • 88,902
  • 11
  • 105
  • 171
0

If you're using SQL 2005 or 2008, perhaps using "snapshot isolation" would help you. It allows the data to remain visible to users while there's an underlying data update operation processing, and then reveals the data as soon as it's committed. Even if you delete takes 30 minutes to run, your applications would stay online during this time.

Here's a quick primer of snapshot locking:

http://www.mssqltips.com/tip.asp?tip=1081

Though you should still try to speed up your delete so it's as quick as possible, this may alleviate some of the burden.

SqlRyan
  • 30,939
  • 32
  • 109
  • 190
  • The problem is that I'm not talking about 30 minutes. I'm talking about tens of millions of rows which are taking me more than 1 second per row to delete. That adds up to months. – Eric Z Beard Jul 24 '09 at 13:01
  • 1 second per row to delete is too much. What hardware? – TomTom Mar 27 '10 at 13:44
  • @TomTom: While that's really slow, it's not inconceivable. Maybe the hardware is slow, or high-contention, or maybe the deletes are cascading, or there's a clustered index on the table you're deleting from that's causing a lot of physical data movement, or perhaps some delete triggers that are busy running for each row. While you can definitely do some things to try and address it, there may be reasons why it's that slow. – SqlRyan Mar 27 '10 at 17:29
  • Yes, but checking hardware first is never a bad idea. From thee I would move to stuff like a delete query plan (nice - shows the triggers). – TomTom Mar 27 '10 at 18:47
0

You can delete small batches using a while loop, something like this:

DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
WHILE @@ROWCOUNT > 0
BEGIN
    DELETE TOP (10000) FROM LogMessages WHERE LogDate < '2/1/2002'
END
Fábio Nascimento
  • 2,283
  • 1
  • 18
  • 25
0

If a sizeable percentage of the table is going to match the deletion criteria (near or over 50%), then it is "cheaper" to create a temporary table with the records that are not going to be deleted (reverse the WHERE criteria), truncate the original table and then repopulate it with the records that were intended to be kept.

DELETE FROM TABLE WHERE ROW_TO_DELETE = 'OK';
GO

-->

INSERT INTO #TABLE WHERE NOT ROW_TO_DELETE = 'OK';
TRUNCATE TABLE;
INSERT INTO TABLE (SELECT * FROM #TABLE);
GO
alejandrob
  • 522
  • 5
  • 6
-1

here is the solution to your problem.

DECLARE @RC AS INT
SET @RC = -1

WHILE @RC <> 0
BEGIN
    DELETE TOP(1000000) FROM [Archive_CBO_ODS].[CBO].[AckItem] WHERE [AckItemId] >= 300
    SET @RC = @@ROWCOUNT
    --SET @RC = 0
END
FelixSFD
  • 5,456
  • 10
  • 40
  • 106