18

I saw a LINQ query syntax in my project today which was counting items with a specific condition from a List like this:

int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending     
            select A).ToList().Count();

I thought of refactoring it by rewriting it using Count(Func) to make it more readable. I thought it would also be good performance-wise, so I wrote:

int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

But when I check using StopWatch, the time elapsed by the lambda expression is always more than the query syntax:

Stopwatch s = new Stopwatch();
s.Start();
int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);
s.Stop();
Stopwatch s2 = new Stopwatch();
s2.Start();
int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending
            select A).ToList().Count();
s2.Stop();

Can somebody explain why it is so?

Callum Watkins
  • 2,444
  • 2
  • 27
  • 42
Ehsan Sajjad
  • 59,154
  • 14
  • 90
  • 146
  • 8
    Have you changed execution orders of these queries? And is the result same again? – Farhad Jabiyev Feb 18 '15 at 06:03
  • 3
    Did you do a JIT warmup prior to these? – Yuval Itzchakov Feb 18 '15 at 06:05
  • 1
    Same question as @FarhadJabiyev. Also what source lies under `pTasks`? Is this an SQL database, or is it just Linq-to-objects? – Jeppe Stig Nielsen Feb 18 '15 at 06:07
  • 1
    it is linq to objects – Ehsan Sajjad Feb 18 '15 at 06:14
  • 1
    @FarhadJabiyev by changing order also same result – Ehsan Sajjad Feb 18 '15 at 06:18
  • try remove `ToList()` when you check query syntax – Grundy Feb 18 '15 at 06:20
  • @Grundy query syntax is execuing faster than lambda – Ehsan Sajjad Feb 18 '15 at 06:36
  • As a first step you can try an equivalent lambda which is `pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).ToList().Count()`. – serdar Feb 18 '15 at 07:30
  • In the second example, you're instantiating a list, while in the first tyou're not. Remove the ToList and compare again. var temp = (from A in pTasks where A.StatusID == (int)BusinessRule.TaskStatus.Pending select A).Count(); – realbart Feb 18 '15 at 08:35
  • @realbart the point is in the 2nd one i am creating new list then counting but in first just counting, but why first one execution time is more than 2nd one – Ehsan Sajjad Feb 18 '15 at 09:14
  • @EhsanSajjad Because, your second query will translated into lambda expression by compiler. To this: `pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).Count();`. And `Where(predicate).Count()` is different than `Count(predicate)`. Also, `ToList` is not important here. I have tried to explain this in my answer. – Farhad Jabiyev Feb 18 '15 at 09:44
  • And the difference is that the `Where` iterator avoids indirect virtual table call, but calls iterator methods directly. – Farhad Jabiyev Feb 18 '15 at 14:26
  • ehsan, see my answer below: the two factors that impact your performance are the allocation of an extra iterator (like Farhad described) and the allocation of a list. Depending on the size of your collection, one has the most weight. @Farhad, the same. If you run the performance example below you see that the allocation of a list can also be important, as it scales with the result set. – realbart Feb 18 '15 at 18:10

2 Answers2

22

I have simulated your situation. And yes, there is difference between execution times of these queries. But, the reason of this difference isn't syntax of the query. It doesn't matter if you have used method or query syntax. Both yields the same result because query expres­sions are trans­lated into their lambda expres­sions before they’re com­piled.

But, if you have paid attention the two queries aren't same at all.Your second query will be translated to it's lambda syntax before it's compiled (You can remove ToList() from query, because it is redundant):

pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).Count();

And now we have two Linq queries in lambda syntax. The one I have stated above and this:

pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

Now, the question is:
Why there is difference between execution times of these two queries?

Let's find the answer:
We can understand the reason of this difference by reviewing these:
- .Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate).Count(this IEnumerable<TSource> source)
and
- Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate);

Here is the implementation of Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate):

public static int Count<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) throw Error.ArgumentNull("source");
    if (predicate == null) throw Error.ArgumentNull("predicate");
    int count = 0;
    foreach (TSource element in source) {
        checked {
            if (predicate(element)) count++;
        }
    }
    return count;
}

And here is the Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate):

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) 
        throw Error.ArgumentNull("source");
    if (predicate == null) 
        throw Error.ArgumentNull("predicate");
    if (source is Iterator<TSource>) 
        return ((Iterator<TSource>)source).Where(predicate);
    if (source is TSource[]) 
        return new WhereArrayIterator<TSource>((TSource[])source, predicate);
    if (source is List<TSource>) 
        return new WhereListIterator<TSource>((List<TSource>)source, predicate);
    return new WhereEnumerableIterator<TSource>(source, predicate);
}

Let's pay an attention to Where() implementation. It will return WhereListIterator() if your collection is List, but Count() will just iterate over source. And in my opinion they have made some speed up in the implementation of WhereListIterator. And after this we are calling Count() method which takes no predicate as input and only will iterate on filtered collection.


And regarding to that speed up in the implementation of WhereListIterator:

I have found this question in SO: LINQ performance Count vs Where and Count. You can read @Matthew Watson answer there. He explains the performance difference between these two queries. And the result is: The Where iterator avoids indirect virtual table call, but calls iterator methods directly. As you see in that answer call instruction will be emitted instead of callvirt. And, callvirt is slower than call:

From bookCLR via C#:

When the callvirt IL instruction is used to call a virtual instance method, the CLR discovers the actual type of the object being used to make the call and then calls the method polymorphically. In order to determine the type, the variable being used to make the call must not be null. In other words, when compiling this call, the JIT compiler generates code that verifes that the variable’s value is not null. If it is null, the callvirt instruction causes the CLR to throw a NullReferenceException. This additional check means that the callvirt IL instruction executes slightly more slowly than the call instruction.

Farhad Jabiyev
  • 23,650
  • 6
  • 59
  • 96
  • 2
    I think this is a good investigation, but I think you should have compared `int Count(this IEnumerable source, Func predicate)` to `int Count(this IEnumerable source)`. – Enigmativity Feb 18 '15 at 06:49
  • @Enigmativity Yes, after alling `Count` it will iterate only on filtered collection. I have updated my answer. – Farhad Jabiyev Feb 18 '15 at 06:59
5

Like Farhad said, the implementation of Where(x).Count() and Count(x) vary. The first one instantiates an additional iterator, which on my pc costs about 30.000 ticks (regardless of the collection size)

Also, ToList is not free. It allocates memory. It costs time. On my pc, it roughly doubles execution time. (so linear dependent op the collection size)

Also, debugging requires spin-up time. So it's difficult to accurately measure performance in one go. I'd recommend a loop like this example. Then, ignore the first set of results.

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var pTasks = Task.GetTasks();
            for (int i = 0; i < 5; i++)
            {

                var s1 = Stopwatch.StartNew();
                var count1 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s1.Stop();
                Console.WriteLine(s1.ElapsedTicks);

                var s2 = Stopwatch.StartNew();
                var count2 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).ToList().Count();
                s2.Stop();
                Console.WriteLine(s2.ElapsedTicks);

                var s3 = Stopwatch.StartNew();
                var count3 = pTasks.Where(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending).Count();
                s3.Stop();
                Console.WriteLine(s3.ElapsedTicks);


                var s4 = Stopwatch.StartNew();
                var count4 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).Count();
                s4.Stop();
                Console.WriteLine(s4.ElapsedTicks);

                var s5 = Stopwatch.StartNew();
                var count5 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s5.Stop();
                Console.WriteLine(s5.ElapsedTicks);
                Console.WriteLine();
            }
            Console.ReadLine();
        }
    }

    public class Task
    {
        public static IEnumerable<Task> GetTasks()
        {
            for (int i = 0; i < 10000000; i++)
            {
                yield return new Task { StatusID = i % 3 };
            }
        }

        public int StatusID { get; set; }
    }

    public class BusinessRule
    {
        public enum TaskStatus
        {
            Pending,
            Other
        }
    }
}
Amal K
  • 2,135
  • 9
  • 27
realbart
  • 2,705
  • 21
  • 28