1

I have an application which receives data from multiple sockets and then write the data into a DB.

I am currently using EF to do this. I would like to know how I can make it more efficient.

I have read that doing a bulk insert is faster so I am only saving changes to the DB every 500 insters:

   db.Logs_In.Add(tableItem);
            if (logBufferCounter++ > 500)
            {
                db.SaveChanges();
                logBufferCounter = 0;
            }

Now I have profiled the application and 74% of the work is being done by the Function: System.Data.Enitity.DbSet'1[System._Canon].Add

Is there a better way to do the insert? Maybe queue up the tableItems into a List and then add the whole list to the DB Context.

Or maybe Im looking at it all wrong and I should totally avoid using EntityFramework for this higher performance insert? Currently it is the bottle neck in my application and if I look at the system resources SQL doesn't even seem to be budging an eyelid.

So my Questions:

1: In what way would I achieve the most efficient / quickest insert on multiple inserts

2: If EF is acceptable, how can I improve my solution?

I am using SQL Server 2012 enterprise Edition, The incoming data is a constant stream, however I can afford to buffer it and then do A bulk insert if this is a better solution.

[EDIT]

To further explain the scenario. I have a thread which is looping on a concurrentQueue which dequeues the items from this queue. However due to the fact that the db insert is the bottle neck. there are often thousands of entries in the queue, So if there is also an Async or Parallel way in which I could possibly make use of more than one thread to do the insert.

Zapnologica
  • 20,003
  • 39
  • 136
  • 229
  • 1
    Pretty good discussion here http://stackoverflow.com/a/5942176/1663001 on high frequency inserts. – DavidG Jul 22 '14 at 08:52
  • 1
    @DavidG, Thanks, That really helps if I go the EF route, I made the suggestions changes and I already have 100 times performance increase. – Zapnologica Jul 22 '14 at 09:20
  • That's awesome, I hope it's enough for you. If not, then Marc's answer is the way to go. – DavidG Jul 22 '14 at 09:25

1 Answers1

4

For scenarios that involve large amounts of inserts, I tend to favor "buffer seperately" (in-memory, or a redis list, or whatever), then as a batch job (perhaps every minute, or every few minutes) read the list and use SqlBulkCopy to throw the data into the database as efficiently as possible. To help with that, I use the ObjectReader.Create method of fastmember, which exposes a List<T> (or any IEnumerable<T>) as an IDataReader that can be fed into SqlBulkCopy, exposing properties of T as logical columns in the data-reader. All you need to do, then, is fill the List<T> from the buffer.

Note, however, that you need to think about the "something goes wrong" scenario; i.e. if the insert fails half way through, what do you do about the data in the buffer? One option here is to do the SqlBulkCopy into a staging table (same schema, but not the "live" table), then use a regular INSERT to copy the data in one step when you know it is at the database - this makes recovery simpler.

Marc Gravell
  • 927,783
  • 236
  • 2,422
  • 2,784
  • +1 on the error issue, is there anyway to some how insert all the ones that work, and then possibly put the failed inserts into a failed table? – Zapnologica Jul 22 '14 at 08:58
  • 1
    @Zapnologica that depends on the nature of the error. If the error is that your connection to the sql server broke half way through then... what are you going to do? Indeed, because `SqlBulkCopy` doesn't tell you how far it got (let alone tell you what is in the roll-forward part of a log file, if the database server crashed totally), your options are limited. A "redo from start" strategy is a reasonable one, if you can compensate for duplicates in the later retry. A "the data isn't **that** important; I don't care if we lose a few rows from one batch" strategy can also sometimes be valid. – Marc Gravell Jul 22 '14 at 09:02