database optimization strategy for very large tables and expensive delete operations

Question

Our Database has a very large table with several million rows of data. Some old code was written from a naive standpoint by myself several years ago, and doesn't handle for poor database performance. The database itself was also built at a point where my real-world knowledge was limited.

This table (by neccessity of the use-case) has large numbers of superfluous rows that should (I think) be cleaned up over time. Eventually, I'll write some routines to automatically do this cleanup on a daily basis (thus decreasing the cost).

In the meantime however, I need to come up with an optimization approach. I don't have any experience writing indexes for a database, so I think I'm going to have to do some learning there.

My approach thus far has been to delete the superfluous rows as such:

SET NOCOUNT ON;
DECLARE @r INT;
SET @r =1;
While @r > 0
    BEGIN
        BEGIN TRANSACTION;
        DELETE TOP (100)
        From [Shift Offer] 
        WHERE offer_timestamp IS NULL
            AND callout_id_fk < 18605 
            AND call_round_id_fk IS NULL
        SET @r = @@ROWCOUNT;
        COMMIT TRANSACTION;
    END

Previously, I had Top(1000) set, which caused some pretty serious performance degradation.

After doing some reading, I saw that viewing the execution plan can give us some insight here. Now, I can see the problem in this query is that (I think) the existence of the clustered index is causing slow write operations.

The table is denormalized such that it's not doing a ton of joins (if any) when we're doing read or update operations. Each chunk of the data (defined by callout_id_fk) is only worked on for maximum a few days, and then is only stored for record keeping purposes.

As the table has grown though, there have been some performance issues that have arisen. One of which I was actually able to reproduce when I accidentally degraded the performance with my delete operation. So that tells me we certainly need to do some database tuning here in addition for writing the software code to be a little more robust in terms of handling bad performance.

So I'm left with the question. Is deleting the offending superfluous rows a bad approach? Could the database be improved by putting some thought into indexing our offending table (rather than letting Azure do the Indexing)? Should I do both deleting of rows and creating indexes?

Lastly, should I drop the indexes, do the delete operation, and then recreate each index? I'm unsure if dropping the indexes will exacerbate the performance issue while running this operation, so I'm curious what other folks might think is a good approach here.

I see a close vote here, so if an admin sees this as being more appropriate for dba.stackexchange, feel free to move this there. — Scuba Steve, Sep 24 '18 at 19:50
So I managed to drastically improve the speed of the seek. I experimented by doing Select TOP(100) Apparently it really didn't like looking for call_round_id_fk, for some reason. Which I think should be just fine, as it was just a precaution. — Scuba Steve, Sep 24 '18 at 22:01

score -1 · Answer 1 · answered Sep 24 '18 at 23:14

-1

select 2
while (@@rowcount > 0)
begin 
     delete top(200)
     From [Shift Offer] 
     WHERE offer_timestamp IS NULL
       AND callout_id_fk < 18605 
       AND call_round_id_fk IS NUL
end

answered Sep 24 '18 at 23:14

paparazzo

42,665
20
93
158

Care to add some explanation? – Scuba Steve Sep 25 '18 at 19:11
@ScubaSteve Did you try it. What don't you understand? – paparazzo Sep 25 '18 at 19:13
Your code is almost exactly the same as my code... What's the difference? – Scuba Steve Sep 25 '18 at 19:14
For one it is a lot shorter. Again did you try it? Sorry I could not be of assistance. – paparazzo Sep 25 '18 at 19:16
Not going to argue with you. – paparazzo Sep 25 '18 at 19:18

database optimization strategy for very large tables and expensive delete operations

1 Answers1