In an app that I'm writing I have two potentially large sets of data I need to map against each other. One is a List returned from a web service and one is a DataTable. I need to take the ANSI (or ISO) number for each item in the list and find the row of the DataTable containing that ANSI number and then do stuff with it.
Since DataTable.Select is pretty slow and I would have to do that for each item in the List, I experimented with faster alternatives. Keep in mind that there is no database for the DataTable object. So I can't leverage any SQL capabilities or anything like that.
I thought the fastest way might be to create a dictionary with a KeyValuePair (A:Ansi number or I:Iso number) and use that as a key. The value would be the rest of the row. Creating that dictionary would obviously take a little processing time, but then I could leverage the extremely fast search times of the dictionary to find each row I need and then add the rows back to the table afterwards. So within the foreach loop going for the list I would only have a complexity of O(1) with the dictionary instead of O(n) or whatever DataTable.Select has.
To my surprise it turned out the dictionary was incredibly slow. I couldn't figure out why until I found out that using a string (just ANSI number) instead of a KeyValuePair increased the performance dramatically. I'm talking hundreds of times faster. How on Earth is that possible? Here is how I test:
I generate a List that simulates the output from the web service. I create a dictionary based on that list with a key (either string or KeyValuePair) and the DataRow as value. I go through a foreach loop for that list and search each item in that list in my dictionary and then assign a value to the DataRow that is returned. That's it.
If I use KeyValuePair as a key to access the dictionary it takes seconds for 1,000 items, if I modify the dictionary to take only a string as a key it takes milliseconds for 10,000 items. FYI: I designed the test so that there would always be hits, so all keys are always found.
Here is the block of code for which I'm measuring the time:
foreach(ProductList.Products item in pList.Output.Products)
{
//KeyValuePair<string, string> kv = new KeyValuePair<string, string>("A", item.Ansi);
DataRow row = dict[item.Ansi];
for (int i = 0; i < 10; i++)
{
row["Material"] = item.Material + "a"; //Do stuff just for debugging
}
hits++;
}
So how on Earth is it possible that the execution time suddenly becomes hundreds of times longer if I use a Dictionary(KeyValuePair,DataRow) instead of Dictionary(String,DataRow)?