9

We are importing a csv file with CSVReader then using SqlBulkCopy to insert that data into SQL Server. This code works for us and is very simple, but wondering if there is a faster method (some of our files have 100000 rows) that would also not get too complex?

        SqlConnection conn = new SqlConnection(connectionString);
        conn.Open();
        SqlTransaction transaction = conn.BeginTransaction();
        try
        {
            using (TextReader reader = File.OpenText(sourceFileLocation))
            {
                CsvReader csv = new CsvReader(reader, true);
                SqlBulkCopy copy = new SqlBulkCopy(conn, SqlBulkCopyOptions.KeepIdentity, transaction);
                copy.DestinationTableName = reportType.ToString();
                copy.WriteToServer(csv);
                transaction.Commit();
            }
        }
        catch (Exception ex)
        {
            transaction.Rollback();
            success = false;
            SendFileImportErrorEmail(Path.GetFileName(sourceFileLocation), ex.Message);
        }
        finally
        {
            conn.Close();
        }
Austin Harris
  • 4,462
  • 6
  • 23
  • 35
  • 1
    I doubt you'll find anything faster than this. How long does it take now? How long do you expect it to take? – Robert Harvey Jan 08 '16 at 01:10
  • 4 minutes with a csv file with 180,000 rows as our test data. Compared to 1.5 minutes with SSIS, which we would rather not use. – Austin Harris Jan 08 '16 at 01:12
  • If performance is critical in your integration scenario I would recommend SSIS even though it has its kinks... On SQL Server 2012+ you can also develop SSDT through Visual Studio. – Mario Tacke Jan 08 '16 at 01:13
  • How many fields in the CSV file? How large is the file at 180k rows? 4 minutes seems abnormally long for only 180k rows, especially for just doing a straight INSERT via `SqlBulkCopy`. Also, it looks like there might be a copy/paste error in your sample code as it sees that the entire chunk of code is duplicated in place of the closing `}` in the first `finally` block. – Solomon Rutzky Jan 08 '16 at 02:24
  • And what "CsvReader" code are you using, exactly? I have found a few different ones, but I am guessing that the one you are using it returning a `DataTable`. If so, then loading the entire CSV into memory before calling `SqlBulkCopy` would explain some of the 4 minutes that it is taking, and why SSIS is not taking as long. Similar to SSIS, the method I describe in my [answer](http://stackoverflow.com/a/34668439/577765) also streams the data from the file into SQL Server, so it isn't being loaded fully into memory first :-). – Solomon Rutzky Jan 08 '16 at 05:46
  • Please consider marking your question answered or editing your question to get more information! :) – Mario Tacke Jul 08 '16 at 17:22

2 Answers2

3

Instead of building your own tool to do this, have a look at SQL Server Import and Export / SSIS. You can target flat files and SQL Server databases directly. The output dtsx package can also be run from the command line or as a job through the SQL Server Agent.

The reason I am suggesting it is because the wizard is optimized for parallelism and works really well on large flat files.

Mario Tacke
  • 5,005
  • 3
  • 25
  • 46
  • Thank you. Would this work as an automated process? – Austin Harris Jan 08 '16 at 01:14
  • Yes, you can, for example, schedule the package in SQL Server as a job/task. I generally start with the wizard and then modify the package in BIDS/SSDT depending on your version of SQL Server (see my comment on your original question). – Mario Tacke Jan 08 '16 at 01:15
1

You should consider using a Table-Valued Parameter (TVP), which is based on a User-Defined Table Type (UDTT). This ability was introduced in SQL Server 2008 and allows you to define a strongly-typed structure that can be used to stream data into SQL Server (if done properly). An advantage of this approach over using SqlBulkCopy is that you can do more than a simple INSERT into a table; you can do any logic that you want (validate / upsert / etc) since the data arrives in the form of a Table Variable. You can deal with all of the import logic in a single Stored Procedure that can easily use local temporary tables if any of the data needs to be staged first. This makes it rather easy to isolate the process such that you can run multiple instances at the same time as long as you have a way to logically separate the rows being imported.

I posted a detailed answer on this topic here on S.O. a while ago, including example code and links to other info:

How can I insert 10 million records in the shortest time possible?

There is even a link to a related answer of mine that shows another variation on that theme. I have a third answer somewhere that shows a batched approach if you have millions of rows, which you don't, but as soon as I find that I will add the link here.

Community
  • 1
  • 1
Solomon Rutzky
  • 41,664
  • 6
  • 112
  • 149