2

Problem: I send Requests to my server with an Array of models are added to the database. One Array for example contains 5000 models. Now all of these have to be checked if their reference Id is already exists in Table One and if the id isn't in the actual table (Named Two in the example). If that is the case I will add it to the database. I will ask that 5000 times, once for each model which takes pretty long. Since I am seldom working with databases and the entity framework, I don't really know how to improve the query. What am I doing wrong, what can I improve?

foreach(twoModel model in Models){
  if (Context.TAB_One.All(m => m.Id != model.RefId_One)) {
    continue;
  }
  if (Context.TAB_Two.Any(m => m.Id == model.Id)) {
    continue;
  }
Context.TAB_Two.Add(model);
}
Stefan
  • 14,240
  • 9
  • 51
  • 69
JSTW1212
  • 65
  • 3
  • What is the total number of expected records found in `TAB_One` and `TAB_Two` respectively? – Igor Mar 21 '17 at 12:13
  • Tab One will contain much less. The data will be stored in the database and will never/or just seldomly be deleted. So for both Tabels the records will increase over time. Tab one will be added just once for each request where i add a set of Tab_two data. You could say there are about 5000 times more records but that can differ from about 500 to unknown (maybe 20000 entries/Tab_One entry). The short form is : 1...n where n is maybe between 2000 and 6000. – JSTW1212 Mar 21 '17 at 13:06

4 Answers4

1

The easiest thing to do is to return all the existing values in the database and build up a hash list.

// I guessed that it is an int, change it accordingly
var tableOneLookup = new HashSet<int>(Context.TAB_One.Id.Select(x => x.Id));
var tableTwoLookup = new HashSet<int>(Context.TAB_Two.Id.Select(x => x.Id));

// example of checking (check based on code you had shown before)
var modelsNotInTabOne = Models.Where(model => !tableOneLookup.Contains(model.RefIdOne)).ToList();
var modelsNotInTabTwo = Models.Where(model => !tableOneLookup.Contains(model.Id)).ToList();

// now do something with the results like create entity instances

If you are using strings as an identifier then you should add a string comparer. If you use StringComparer.OrdinalIgnoreCase it will ensure that case is NOT accounted for, pick the correct one you need.

new HashSet<int>(Context.TAB_Two.Id.Select(x => x.Id), StringComparer.OrdinalIgnoreCase);

Option 2

If your table has a massive number of records then the above solution is no longer performant. You could do the inverse by finding which ids are in the database and then filter on that result.

// create lists of just the ids
var modelOneIds = Models.Select(model => model.RefIdOne).ToList();
var modelTwoIds = Models.Select(model => model.Id).ToList();

// create a list of ids found in the db
var foundInTableOne = new List<int>();
var foundInTableTwo = new List<int>();

// iterate over the model in batches of 1000, the translated IN clause in Sql has a limit. You can tweak this number accordingly
const int batchSize = 1000;
for(int i = 0; i < Models.Count; i+=batchSize){
    var tmpModelOneIds = modelOneIds.Skip(i).Take(batchSize).ToList();
    var tmpModelTwoIds = modelTwoIds.Skip(i).Take(batchSize).ToList();

    foundInTableOne.AddRange(Context.TAB_One.Where(itm => tmpModelOneIds.Contains(itm.Id)).Select(itm => itm.Id));
    foundInTableTwo.AddRange(Context.TAB_Two.Where(itm => tmpModelTwoIds.Contains(itm.Id)).Select(itm => itm.Id));
}

// find the models not found in the DB
var modelsNotInTabOne = Models.Where(model => !foundInTableOne.Contains(model.RefIdOne)).ToList();
var modelsNotInTabTwo = Models.Where(model => !foundInTableTwo.Contains(model.Id)).ToList();

// now do something with the results like create entity instances
Igor
  • 55,253
  • 10
  • 80
  • 149
  • Thanks for the answer, as you mentioned the number of records is increasing rapidly so it will become a massive amount. Esoecially in the case of Tab_Two. – JSTW1212 Mar 21 '17 at 13:24
  • @JSTW1212 - glad it will work for you. Please consider marking an answer. – Igor Mar 21 '17 at 13:28
0

You could filter the models and add the result of this filter. For sample:

// define the array of IDs you want to search
var refIds = Models.Select(x => x.RefId_One).ToArray();
var ids = Models.Select(x => x.Id).ToArray();

// perform the queries to filter
var allModelOne = Context.TAB_One.All(m => !refIds.Contains(m.Id));
var anyModelTwo = Context.TAB_Two.Any(m => ids.Contains(m.Id));

// filter models 
var modelsToSave = model.Where(m => !allModelOne.Contains(m.Id) && !anyModelTwo.Contains(m.Id));

// save them
foreach(twoModel model in modelsToSave) {   
    Context.TAB_Two.Add(model);
}

Filtering it before, your code will avoid queries on the loop.

Obs: I am not sure about your model, but it should be something like the sample.

Felipe Oriani
  • 35,246
  • 17
  • 121
  • 176
0

Can I give you another way to achive this?

First of all take a look at this post: Fastest Way of Inserting in Entity Framework

As you can see, entity framework is not the best way to do large work.

So consider using Bulk insert and then execute an Store Procedure to do the hard work.

Or, if you really want to do it by C# & EF, at least, do it in branchs, ie, run SaveChanges() at 100 at a time.

Cheers

Community
  • 1
  • 1
André Pedroso
  • 1,334
  • 14
  • 15
0

I would read the id's from table one and table two into two Dictionary objects, Dictionary is one of the fastest data structures for searching if the key is already there. You should see an enormous improvement in the performance. You can then add the id's you need to add to table two into a list<> and do a batch insert at the end.

Neil Humby
  • 213
  • 1
  • 12