1

I have an in memory List of objects. I want to check if each one exists in a database and if not, set a bool property on that object to true.

Object

class Part
    {
        public bool NewPart { get; set; }
        public string PartNumber { get; set; }
        public string Revision { get; set; }
        public string Description { get; set; }
    }

List contains the collection of parts. For each part, if it exists in the database then NewPart should be set to FALSE, else TRUE. I'm looking for the most efficient way to do this as there are likely to be hundred of parts so I'm thinking that running a SQL query for each part may not be the most efficient method.

Ideas on the best way to achieve this appreciated.

sparkymark75
  • 583
  • 8
  • 28

3 Answers3

1

It depends on which ORM you are using, but with Linq2Sql you can use a query like:

var query = from p in db.parts
            where myList.Contains(p.PartNumber)
            select p.PartNumber;

You can then use the IEnumerable returned to set your newPart field

As an alternative, if your ultimate goal is to do an Upsert type action, then check out this question and its answers Insert Update stored proc on SQL Server (needs SQL level implementation, not linq)

Community
  • 1
  • 1
Stuart
  • 65,726
  • 7
  • 109
  • 161
1

The following will only hit the database once.

   var myList = (from r in parts select r.PartNumber).ToList();

    var existingParts = (from r in dc.Parts
       where myList.Contains(r.PartNumber) select r.PartNumber).ToList();

    foreach(var r in parts)
         r.NewPart = existingParts.Contains(r.PartNumber);

Note, the generated sql could very well be something like

SELECT PartNumber
FROM Parts Where PartNumber in (@p0, @p1, @p2, @p3 .... )

so this should work if the parts list of a hundred or so, but not if it is over 2100.

sgmoore
  • 14,470
  • 5
  • 37
  • 65
0

This is one of those cases where the most efficient approach depends upon the actual data.

The first obtains all partNums from the database:

HashSet<int> partNums = new HashSet<int>(from p in GetTable<DBPart> select p.PartNumber);
foreach(var p in parts)
  p.NewPart = partNums.Contains(p.PartNumber);

The second queries the database with the relevant partNumbers:

HashSet<int> partNums = new HashSet<int>(
  from p in GetTable<DBPart> where (from mp in parts select mp.PartNumber).Contains(p.PartNumber) select p.PartNumber);
foreach(var p in parts)
  p.NewPart = partNums.Contains(p.PartNumber);

The former will be more efficient above a certain number of rows in the database, and less efficient above it, because the latter takes a longer time to build a more complicated query, but the former returns everything.

Another factor is the percentage of hits expected. If this number is relatively low (i.e. only a small number of the parts in the list will be in the database) then it could be more efficient to do:

Dictionary<int, Part> dict = partsSource.ToDictionary(p => p.PartNumber, p);
foreach(int pn in 
  from p in GetTable<DBPart> where (from kv in dict select kv.Key).Contains(p.PartNumber) select p.PartNumber);
  dict[pn].NewPart = true;

Where partsSource is the means by which the List parts was obtained in the first place, here instead of obtaining a list, we obtain a dictionary, which makes for more efficient retrieval of those we want. However, it we're going to obtain parts as a list anyway, then we can't really gain here, as we use slightly more effort building the dictionary in the first place, than iterating through the list.

Jon Hanna
  • 102,999
  • 9
  • 134
  • 232