0

I am looking for much more better way to update tables using SSIS. Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic).

The logic is,

  1. Select the source data from staging then inserts into physical temp table in the DW (i.e TMP_Tbl)
  2. Update all data matching by customerId column from TMP_Tbl to MyTbl.
  3. Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl.

Using the above steps, this takes some time populating TMP_Tbl. Hence, i planned to change the logic to delete-insert but according to this: In SQL, is UPDATE always faster than DELETE+INSERT? this would be a recipe for pain.

Given:

  • no index/keys used on the tables
  • some tables contains 5M rows, some contains 2k rows
  • each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all
  • these updates we're in separate sequence container simultaneously runs

Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal?

Community
  • 1
  • 1
Victor111
  • 58
  • 8
  • performance could be improved by creating some indexes. without any, 15 minutes doesn't sound too bad considering you're running against millions of rows. SSIS is designed to do Inserts, updates, so you shouldn't really consider doing delete/insert unless you have a really special reason to do so. – Tanner Apr 21 '17 at 08:21
  • Correct me if I'm wrong with this. I think it will consume much more time on the updates if I add a non-clustered index on my table. – Victor111 Apr 21 '17 at 08:41

1 Answers1

0

With SSIS you usually BULK INSERT, not INSERT. So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE.

Considering this the faster approach will be:

  1. [Execute SQL Task] Delete all records which you need to update. (Depending on your DB design and queries, some index may help here).

  2. [Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. No need for temp tables here.

If you cannot/don't want to DELETE records - your current approach is OK too. You just need to fix the performance of that UPDATE query (adding an index should help). 2-3 minutes per every record updated is way too long. If it is 2-3 minutes for updating millions of records though - then it's acceptable.

Adding the correct non-clustered index to a table should not result in "much more time on the updates". There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it.

helix
  • 346
  • 2
  • 9