"Nested foreach" vs "lambda/linq query" performance(LINQ-to-Objects)

Question

In performance point of view what should you use "Nested foreach's" or "lambda/linq queries"?

https://gist.github.com/hazzik/6912865 tl;dr nested `foreach` is faster than `SelectMany`, but `foreach` + `List.AddRange()` is fastest — Ian Kemp, May 02 '18 at 12:20

Jon Skeet · Accepted Answer · 2009-06-25T14:42:25.757

Write the clearest code you can, and then benchmark and profile to discover any performance problems. If you do have performance problems, you can experiment with different code to work out whether it's faster or not (measuring all the time with as realistic data as possible) and then make a judgement call as to whether the improvement in performance is worth the readability hit.

A direct foreach approach will be faster than LINQ in many cases. For example, consider:

var query = from element in list
            where element.X > 2
            where element.Y < 2
            select element.X + element.Y;

foreach (var value in query)
{
    Console.WriteLine(value);
}

Now there are two where clauses and a select clause, so every eventual item has to pass through three iterators. (Obviously the two where clauses could be combined in this case, but I'm making a general point.)

Now compare it with the direct code:

foreach (var element in list)
{
    if (element.X > 2 && element.Y < 2)
    {
        Console.WriteLine(element.X + element.Y);
    }
}

That will run faster, because it has fewer hoops to run through. Chances are that the console output will dwarf the iterator cost though, and I'd certainly prefer the LINQ query.

EDIT: To answer about "nested foreach" loops... typically those are represented with SelectMany or a second from clause:

var query = from item in firstSequence
            from nestedItem in item.NestedItems
            select item.BaseCount + nestedItem.NestedCount;

Here we're only adding a single extra iterator, because we'd already be using an extra iterator per item in the first sequence due to the nested foreach loop. There's still a bit of overhead, including the overhead of doing the projection in a delegate instead of "inline" (something I didn't mention before) but it still won't be very different to the nested-foreach performance.

This is not to say you can't shoot yourself in the foot with LINQ, of course. You can write stupendously inefficient queries if you don't engage your brain first - but that's far from unique to LINQ...

The first sentence should be printed out as a banner and hung in every programming department. — Matthew Lock, Nov 02 '12 at 03:18

score 24 · Answer 2 · answered Jun 25 '09 at 14:31

24

If you do

foreach(Customer c in Customer)
{
  foreach(Order o in Orders)
  {
    //do something with c and o
  }
}

You will perform Customer.Count * Order.Count iterations

If you do

var query =
  from c in Customer
  join o in Orders on c.CustomerID equals o.CustomerID
  select new {c, o}

foreach(var x in query)
{
  //do something with x.c and x.o
}

You will perform Customer.Count + Order.Count iterations, because Enumerable.Join is implemented as a HashJoin.

answered Jun 25 '09 at 14:31

Amy B

100,846
20
127
174

great answer with a good explanation – Josh E Jun 25 '09 at 14:39
6

That's because you're running two different algorithms; you're comparing apples to oranges. I don't see how this is relevant to the question posed. – mqp Jun 25 '09 at 14:50
4

Your nested foreach is actually equivalent to SelectMany, not Join - i.e. from c in Customer from o in Orders ... (no join) – Marc Gravell Jun 25 '09 at 15:03
2

Upvote! @mquander: It's not very unlikely that the OP will have "if (c.CustomerID != o.CustomerID) continue;" as the first statement inside the loop. – erikkallen Jun 25 '09 at 15:07
Nice explanation. How can I modify the query for a left outer join? I have 2 nested loops, but the common field could be null. In that case I insert else update. – One-One May 11 '12 at 05:44
Got it Sir, worked it out myself after some trial and error. I was learning LINQ and adapting it for day to day tasks. Your explanation was very helpful. – One-One May 11 '12 at 16:12

score 13 · Answer 3 · answered Jun 25 '09 at 14:29

13

It is more complex on that. Ultimately, much of LINQ-to-Objects is (behind the scenes) a foreach loop, but with the added overhead of a little abstraction / iterator blocks / etc. However, unless you do very different things in your two versions (foreach vs LINQ), they should both be O(N).

The real question is: is there a better way of writing your specific algorithm that means that foreach would be inefficient? And can LINQ do it for you?

For example, LINQ makes it easy to hash / group / sort data.

answered Jun 25 '09 at 14:29

Marc Gravell

927,783
236
2,422
2,784

May I ask what is O(N)? – Jeb50 Nov 01 '20 at 22:51
@Jeb50 that's a common way of indicating the performance characteristics of something; O(N) means that the time taken scales linearly as a factor of the number of elements (N). O(N) is almost always fine - naive sorting would be O(N^2), for example, which is much worse, but exponential (for example O(2^N)) or super-exponential (for example O(N!)) would be worse again. – Marc Gravell Nov 02 '20 at 07:08

score 3 · Answer 4 · answered Jun 25 '09 at 14:43

It's been said before, but it merits repeating.

Developers never know where the performance bottleneck is until they run performance tests.

The same is true for comparing technique A to technique B. Unless there is a dramatic difference then you just have to test it. It might be obvious if you have an O(n) vs O(n^x) scenario, but since the LINQ stuff is mostly compiler witchcraft, it merits a profiling.

Besides, unless your project is in production and you have profiled the code and found that that loop is slowing down your execution, leave it as whichever is your preference for readability and maintenance. Premature optimization is the devil.

While it is true that you cannot truly anticipate performance bottlenecks, it is also true that most performance issues are designed in, are found very late in the development life cycle and are therefore difficult to code out. There is a great deal to be said for always having an eye open to the performance implications of the design and implementation decisions you are making, rather than blithely coding away hoping it will be ok. — Mike, Jun 11 '10 at 13:40

score 2 · Answer 5 · answered Jun 25 '09 at 14:44

2

A great benefit is that using Linq-To-Objects queries gives you the ability to easily turn the query over to PLinq and have the system automatically perform he operation on the correct number of threads for the current system.

If you are using this technique on big datasets, that's an easily become a big win for very little trouble.

answered Jun 25 '09 at 14:44

Denis Troller

7,251
1
21
35

True but there are also parallel equivalents proposed for foreach as well. http://www.danielmoth.com/Blog/2009/01/parallelising-loops-in-net-4.html – jpierson Nov 17 '09 at 12:38

"Nested foreach" vs "lambda/linq query" performance(LINQ-to-Objects)

5 Answers5

Linked