21

Why does forcing materialization using ToList() make my query orders of magnitude faster when, if anything, it should do the exact opposite?

1) Calling First() immediately

    // "Context" is an Entity Framework DB-first model

    var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

    var User = query.First();

    //  ** The above takes 30+ seconds to run **

2) Calling First() after calling ToList():

    var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

    var User = query.ToList().First();     // Added ToList() before First()

    // ** Now it takes < 1 second to run! **

Update and Resolution

After getting the generated SQL, the only difference is, as expected, the addition of TOP (1) in the first query. As Andyz Smith says in his answer below, the root cause is that the SQL Server optimizer, in this particular case, chooses a worse execution plan when TOP (1) is added. Thus the problem has nothing to do with LINQ (which did the right thing by adding TOP (1)) and everything to do with the idiosyncrasies of SQL Server.

Community
  • 1
  • 1
JoeCool
  • 4,350
  • 11
  • 47
  • 66
  • 2
    As the problem continued to be investigated, the question was turning into another question altogether, so I figured I should clean it up and mark it answered, since the original question (why LINQ seemed to be doing something very strange) was indeed answered. Thanks everyone for the help on this. – JoeCool Aug 22 '13 at 14:16
  • Unless there's a good reason to keep this question closed after I drastically cleaned it up, please reopen. Thanks. – JoeCool Aug 26 '13 at 13:20
  • If you're only going for a single record you should use .Single, by using .First you run across the chance that there is more than one matching row and you may not be getting the data you expect. – Chris Jul 21 '16 at 17:25

3 Answers3

11

I can only think of one reason... To test it, can you please remove the Where clause and re-run the test? Comment here if the result is the first statement being faster, and i will explain why.

Edit
In the LINQ statement Where clause, you are using the .ToLower() method of the string. My guess is that LINQ does not have built in conversion to SQL for this method, so the resultant SQL is something line

SELECT *
FROM Users

Now, we know that LINQ lazy loads, but it also knows that since it has not evaluated the WHERE clause, it needs to load the elements to do the comparison.

Hypothesis
The first query is lazy loading EVERY element in the result set. It is then doing the .ToLower() comparison and returning the first result. This results in n requests to the server and a huge performance overhead. Cannot be sure without seeing the SQL Tracelog.

The Second statement calls ToList, which requests a batch SQL before doing the ToLower comparison, resulting in only one request to the server

Alternative Hypothesis
If the profiler shows only one server execution, try executing the same query with the Top 1 clause and see if it takes as long. As per this post (Why is doing a top(1) on an indexed column in SQL Server slow?) the TOP clause can sometimes mess with the SQL server optimiser and stop it using the correct indices.

Curiosity edit
try changing the LINQ to

var query = from x in Context.Users
            where x.Username.Equals(User.Identity.Name, StringComparison.OrdinalIgnoreCase)
            select x;

Credit to @Scott for finding the way to do case insensitive comparison in LINQ. Give it a go and see if it is faster.

Community
  • 1
  • 1
David Colwell
  • 2,342
  • 16
  • 30
  • 1
    You're right! I took out the where clause altogether and here are my results: with ToList() = 0.74s, without ToList() = 0.33 seconds. Those numbers make far more sense! – JoeCool Aug 20 '13 at 14:30
  • @JoeCool, Can you try the curiosity edit? I would be really curious to see if it is the tolower or the `.Equals()` causing the issue – David Colwell Aug 20 '13 at 14:42
  • As a sidenote the "correct" way to do a case insensitive compare using EF (according to [this SO post](http://stackoverflow.com/questions/18337103/why-does-adding-an-unnecessary-tolist-drastically-speed-this-linq-query-up)) is `x.Username.Equals(User.Identity.Name, StringComparison.OrdinalIgnoreCase)`. – Scott Chamberlain Aug 20 '13 at 14:49
  • @ScottChamberlain, that link is to this SO post. You had me at recursion ;-). Good tip tho, googled it. Will credit you in answer – David Colwell Aug 20 '13 at 14:51
  • 2
    Curse you `Ctrl-C`!!!, [here is the link I ment to give you](http://stackoverflow.com/questions/2994662/entity-framework-and-differences-between-contains-between-sql-and-objects-using) – Scott Chamberlain Aug 20 '13 at 14:53
  • @ScottChamberlain, Smitten with the copy-pasta! Will be interesting to see if it actually results in faster running LINQ or just re-creates the issue... I think LINQ training courses should be labelled 'Defense against the dark arts' – David Colwell Aug 20 '13 at 14:55
  • String.ToLower() is not an extension method... – Boris B. Aug 20 '13 at 18:08
  • 3
    Not only that `String.ToLower()` isn't an extension method, it's also listed as a *canonical function* at EF MSDN (http://msdn.microsoft.com/en-us/library/bb738534.aspx), which means that it is supported by *all* EF providers. Canonical functions are executed **at the data store** (http://msdn.microsoft.com/en-us/library/bb738626.aspx). – Boris B. Aug 20 '13 at 18:43
  • @boris, thanks for the info. If the tolower comparison is not causing the issue then the second set of linq I posted will not result in any performance gain. I would be interested to see the results – David Colwell Aug 21 '13 at 00:37
  • Just wanted to confirm Boris' comment -- when I get the SQL for the query (using ToTraceString()), it translates to a LOWER() call in SQL. I'm about to use Profiler to get the final SQL for both calls though, and that should be revealing. – JoeCool Aug 21 '13 at 13:27
3

The SQL won't be the same as Linq is lazy loading. So your call to .ToList() will force .Net to evaluate the expression, then in memory select the first() item.

Where as the other option should add top 1 into the SQL

E.G.

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here
 var User = query.First();

and

var query = from x in Context.Users
                where x.Username.ToLower().Equals(User.Identity.Name.ToLower())
                select x;

 //SQL executed here!
 var list = query.ToList();
 var User = query.First();

As below, the first query should be faster! I would suggest doing a SQL profiler to see what's going on. The speed of the queries will depend on your data structure, number of records, indexes, etc.

The timing of your test will alter the results also. As a couple of people have mentioned in comments, the first time you hit EF it needs to initialise and load the metadata. so if you run these together, the first one should always be slow.

Here's some more info on EF performance considerations

notice the line:

Model and mapping metadata used by the Entity Framework is loaded into a MetadataWorkspace. This metadata is cached globally and is available to other instances of ObjectContext in the same application domain.

&

Because an open connection to the database consumes a valuable resource, the Entity Framework opens and closes the database connection only as needed. You can also explicitly open the connection. For more information, see Managing Connections and Transactions in the Entity Framework.

Community
  • 1
  • 1
Liam
  • 22,818
  • 25
  • 93
  • 157
  • 7
    This would only make the second query take more time, not less time. – Ryan Amies Aug 20 '13 at 14:08
  • ^ Exactly. I knew if it WERE different SQL, then it should take less time if anything! – JoeCool Aug 20 '13 at 14:09
  • 1
    `.ToList() will force .Net to evaluate the expression` but `First()` also does the same (force it). – King King Aug 20 '13 at 14:09
  • was just going to write this, ultimately the query has not executed until you have a value for User the query is still being modified – RoughPlace Aug 20 '13 at 14:10
  • Is there a way to get at the literal SQL generated by `query.First();` ? That way I could test it manually. – JoeCool Aug 20 '13 at 14:11
  • 1
    @JoeCool, if your using SQL Server, fire up SQL profiler, as posted in the answer – Liam Aug 20 '13 at 14:12
  • Will do. And that wasn't in the answer when I first asked :) – JoeCool Aug 20 '13 at 14:13
  • 1
    @JoeCool, I know! Give me chance! :D – Liam Aug 20 '13 at 14:14
  • @KingKing, yes, I know. the point is the timing of the SQL execution and the difference in the actual SQL that is sent to the SQL Server. – Liam Aug 20 '13 at 14:15
  • 1
    Also wanted to note that I only run one test at a time. I then stop the project, add/remove ToList(), and rerun to test again, so EF initialization can't be the issue. – JoeCool Aug 20 '13 at 14:34
  • 2
    An easy way to view generated SQL for calls to First() is using LinqPad (free), http://www.linqpad.net/. You'd then put your Linq expression in the query window and click the "SQL" tab to see the sql that gets generated. You'll notice that First() simply prepends a "TOP (1)" to the sql statement. – Tom Regan Aug 20 '13 at 14:47
  • @TomRegan, awesome! I'm not a SQL admin by any means, and when I've used Profiler before I've slowed down our entire DB haha, so I think I'll try linqpad first. – JoeCool Aug 20 '13 at 14:58
  • So the moral of the story is don't try to circumvent the LINQ query optimization engine by forcing a materialization with ToList. Doing so can cause unexpected results as what should be a simple conditional processed on the server side becomes a huge data pull followed by evaluation of the conditional on the client side. – Andyz Smith Aug 20 '13 at 15:36
  • @AndyzSmith -- actually the moral in this case is the exact opposite. Forcing materialization made my query orders of magnitude faster! – JoeCool Aug 21 '13 at 14:13
0

So, the optimizer chooses a bad way to run the query.

Since you can't add optimizer hints to the SQL to force the optimizer to choose a better plan I see two options.

  1. Add a covering index/indexed view on all the columns that are retrieved/included in the select Pretty ludicrous, but I think it will work, because that index will make it easy peasy for the optimizer to choose a better plan.

  2. Always prematurely materialize queries that include First or Last or Take.  Dangerous because as the data gets larger the break even point between pulling all the data locally and doing the First()  and doing the query with Top on the server is going to change.

http://geekswithblogs.net/Martinez/archive/2013/01/30/why-sql-top-may-slow-down-your-query-and-how.aspx

https://groups.google.com/forum/m/#!topic/microsoft.public.sqlserver.server/L2USxkyV1uw

http://connect.microsoft.com/SQLServer/feedback/details/781990/top-1-is-not-considered-as-a-factor-for-query-optimization

TOP slows down query

Why does TOP or SET ROWCOUNT make my query so slow?

Community
  • 1
  • 1
Andyz Smith
  • 662
  • 5
  • 19