30

I have an MVC 4 application with EF 6. After upgrading from EF 5 to EF 6 I noticed a performance issue with one of my linq-entities queries. At first I was excited because on my development box I noticed a 50% improvement from EF 5 to EF 6. This query returns about 73,000 records. The SQL being run on production server was intercepted with Activity Monitor, Recent Expensive Queries, this timing is also included in the following tables. The following numbers are once DB is is warmed up:

Development: 64 bit OS, SS 2012, 2 cores, 6 GB RAM, IIS Express.

EF 5 ~30 sec
EF 6 ~15 sec
SQL ~26 sec

Production: 64 bit OS, SS 2012, 32 cores, 32 GB RAM, IIS8.

EF 5 ~8 sec
EF 6 ~4 minutes
SQL ~6 sec.

I have included the specs just to give an idea of what relative performance should be. So it appears that when I use EF 6 in my development environment I get performance improvement, when I publish to my production server a huge performance problem. The databases are similar if not exactly the same. All indexes have been rebuilt, the SQL query also seems to indicate that there is no reason to suspect database is at fault. Application pool is .Net 4.0 in production. Both development and production server have .Net 4.5 installed. I don't know what to check next or how to debug this problem, any ideas on what to do or how to debug further?

Update: Using SQL Server Profiler found that EF5 and EF6 produce slightly different TSQL. The TSQL difference is as follows:

EF5: LEFT OUTER JOIN [dbo].[Pins] AS [Extent9] ON [Extent1].[PinId] = [Extent9].[PinID]
EF6: INNER JOIN [dbo].[Pins] AS [Extent9] ON [Extent1].[PinId] = [Extent9].[PinID]

This same TSQL from EF6 also performs differently depending on the server/database the TSQL is executed on. After inspecting query plan for EF6 & slow database (production server SS build 11.0.3000 Enterprise Edition) this plan does all scans and no seeks when compared to an identical instance (test server SS build 11.0.3128 Developers Edition) which has a few seeks that make the difference. Wall clock time is > 4 min for production and 12 sec for small test server. EF places these queries into sp_executesql proc, the intercepted sp_executesql proc was used for the timing mentioned above. I do NOT get slow time (bad query plan) with EF5 or EF6 generated code when executed on development server. Also strange, if I remove TSQL from sp_executesql and run it on production server the query is executed quickly (6 sec). In summary three things need to happen for slow execution plan:

1. Execute on production server build 11.0.3000
2. Use Inner Join with Pins table (EF6 generated code).
3. Execute TSQL inside of sp_executesql.

The test environment was created with a backup of my production data, data on both servers is identical. Could creating a backup and restoring database have fixed some problem with the data? I haven't tried deleting instance and restoring on production server because I would like to know for sure what the problem is before I delete and restore the instance, just in case it does fix the problem. I did try and flush cache with the following TSQL

select DB_ID() 
DBCC Flushprocindb(database_Id)
and 
DBCC FREEPROCCACHE(plan_handle)

Flushing with above did not effect the query plan. Any suggestions what to try next?

Following is the linq query:

    result =
    (
    from p1 in context.CookSales

    join l2 in context.CookSaleStatus on new { ID = p1.PinId, YEAR = year1 } equals new { ID = l2.PinId, YEAR = l2.StatusYear } into list2
    from p3 in list2.DefaultIfEmpty()
    join l3 in context.CookSaleStatus on new { ID = p1.PinId, YEAR = year2 } equals new { ID = l3.PinId, YEAR = l3.StatusYear } into list3
    from p4 in list3.DefaultIfEmpty()
    join l4 in context.CookSaleStatus on new { ID = p1.PinId, YEAR = year3 } equals new { ID = l4.PinId, YEAR = l4.StatusYear } into list4
    from p5 in list4.DefaultIfEmpty()
    join l10 in context.CookSaleStatus on new { ID = p1.PinId, YEAR = year4 } equals new { ID = l10.PinId, YEAR = l10.StatusYear } into list10
    from p11 in list10.DefaultIfEmpty()

    join l5 in context.ILCookAssessors on p1.PinId equals l5.PinID into list5
    from p6 in list5.DefaultIfEmpty()
    join l7 in context.ILCookPropertyTaxes on new { ID = p1.PinId } equals new { ID = l7.PinID } into list7
    from p8 in list7.DefaultIfEmpty()

    join l13 in context.WatchLists on p1.PinId equals l13.PinId into list13
    from p14 in list13.DefaultIfEmpty()

    join l14 in context.Pins on p1.PinId equals l14.PinID into list14
    from p15 in list14.DefaultIfEmpty()
    orderby p1.Volume, p1.PIN
    where p1.SaleYear == userSettings.SaleYear 
    where ((p1.PinId == pinId) || (pinId == null))
    select new SaleView
    {
        id = p1.id,
        PinId = p1.PinId,
        Paid = p1.Paid == "P" ? "Paid" : p1.Paid,
        Volume = p1.Volume,
        PinText = p15.PinText,
        PinTextF = p15.PinTextF,
        ImageFile = p15.FnaImage.TaxBodyImageFile,
        SaleYear = p1.SaleYear,
        YearForSale = p1.YearForSale,
        Unpaid = p1.DelinquentAmount,
        Taxes = p1.TotalTaxAmount,
        TroubleTicket = p1.TroubleTicket,
        Tag1 = p1.Tag1,
        Tag2 = p1.Tag2,
        HasBuildingPermit = p1.Pin1.BuildingPermitGeos.Any(p => p.PinId == p1.PinId),
        BidRate = p1.BidRate,
        WinningBid = p1.WinningBid,
        WinningBidderNumber = p1.BidderNumber,
        WinningBidderName = p1.BidderName,
        TaxpayerName = p1.TaxpayerName,
        PropertyAddress = SqlFunctions.StringConvert((double?)p1.TaxpayerPropertyHouse) + " " + p1.TaxpayerPropertyDirection + " "
                        + p1.TaxpayerPropertyStreet
                        + " " + p1.TaxpayerPropertySuffix +
                        System.Environment.NewLine + (p1.TaxpayerPropertyCity ?? "") + ", " + (p1.TaxpayerPropertyState ?? "") +
                        " " + (p1.TaxpayerPropertyZip ?? ""),
        MailingAddress = (p1.TaxpayerName ?? "") + System.Environment.NewLine + (p1.TaxpayerMailingAddress ?? "") +
                        System.Environment.NewLine + (p1.TaxpayerMailingCity ?? "") + ", " + (p1.TaxpayerMailingState ?? "") +
                        " " + (p1.TaxpayerMailingZip ?? ""),
        Status1 = p3.Status.Equals("Clear") ? null : p3.Status,
        Status2 = p4.Status.Equals("Clear") ? null : p4.Status,
        Status3 = p5.Status.Equals("Clear") ? null : p5.Status,
        Status4 = p11.Status.Equals("Clear") ? null : p11.Status,
        Township = p6.Township,
        AssessorLastUpdate = p6.LastUpdate,
        Age = p6.Age,
        LandSquareFootage = p6.LandSquareFootage,
        BuildingSquareFootage = p6.BuildingSquareFootage,
        CurrLand = p6.CurrLand,
        CurrBldg = p6.CurrBldg,
        CurrTotal = p6.CurrTotal,
        PriorLand = p6.PriorLand,
        PriorBldg = p6.PriorBldg,
        PriorTotal = p6.PriorTotal,
        ClassDescription = p6.ClassDescription,
        Class = p1.Classification == null ? p6.Class.Trim() : p1.Classification,
        TaxCode = p6.TaxCode,
        Usage = p6.Usage,

        Status0 = (p8.CurrentTaxYear != null && p8.CurrentTaxYearPaidAmount == 0) ? "Paid" : null, 
        LastTaxYearPaidAmount = p8.LastTaxYearPaidAmount,
        NoteStatus = p15.PinNotes.Any(p => p.PinId == p15.PinID),
        EntryComment = p1.EntryComment,
        IsInScavenger = p14.IsInScavenger ?? false,
        IsInTbs = p14.IsInTbs ?? false,
        RedeemVts = (p3.Redeemed == "VTS" || p4.Redeemed == "VTS" || p5.Redeemed == "VTS" || p11.Redeemed == "VTS") ? true : false,
        FivePercenter = (p3.FivePercenter || p4.FivePercenter || p5.FivePercenter || p11.FivePercenter) ? true : false,
    }
    ).ToList();

The SQL that is generated with this query seems reasonable. (I have not included it because when I paste it in it's not formatted and difficult to read.)

ricardo
  • 525
  • 5
  • 12
  • 1
    Are you using EF6.0.2? It fixed several perf issues in pure EF6.0.0. – Scott Wylie Mar 17 '14 at 18:45
  • 1
    According to NuGet it's version 6.0.2, however the property for reference EntityFramework indicates the version is 6.0.0.0. I suspect I have the latest version as I just installed this last week from NuGet. – ricardo Mar 17 '14 at 18:53
  • 1
    Have you run the TSQL generated by EF directly on the database? If so, what are the results? It might be worth checking that you don't have EF installed on your production server, and the deployed EF binary is the correct version. The assembly version will be `6.0.0.0` but if you check the properties of the binary it should have a `File Version` like `6.0.21211.0` – Xenolightning Mar 17 '14 at 19:11
  • @Xenolightning, I ran the TSQL on both dev and prod, the timing is included in my table above in the row SQL it's 26 sec and 6 sec respectively. I checked the binary and file version is 6.0.21211.0 in IIS prod bin and in local dev bin, and I have copy local checked. – ricardo Mar 17 '14 at 20:04
  • 5
    We've had issues with queries being slow on production only when running from our code (not when we ran the SQL ourselves) due to **a bad query plan being stuck in the cache**. The workaround was to find and flush the related plan. Also you can use a profiler such as [MiniProfiler](http://miniprofiler.com/) to find out if the time is being spent in the SQL being executed by EF or in some other part of the code. – sinelaw Mar 19 '14 at 02:18
  • Also make sure to restrict the maximum memory of SQL server, otherwise we always ran into performance problems! – D.R. Mar 19 '14 at 13:13
  • Actually 6.0.2 still has performance issues as I saw among discussions out of the stackoverflow. Maybe this can help: http://msdn.microsoft.com/en-us/data/dn582034 – heq Mar 24 '14 at 05:50
  • Tried several different methods of forcing a recompile of query plan, inlcuding exec sp_executesql 'bla bla' with recompile, dbcc freeproccache (handle), delete and restore database. Seems like a SQL Server bug. Next step is to create an exact install of 11.0.3000 Enterprise version and then opening a ticket with MS. Although it's because of EF6 that we have the problem, database is creating issue. – ricardo Mar 24 '14 at 15:31
  • "Execute TSQL inside of sp_executesql" Is there a mis-alignment on field types? In the past I've noticed sp_executesql runs much slower if it sets the input type to a type that's not equivalent to the column type it's comparing against.. This happens a lot where the column type is like an varchar and then sp_executesql is using nvarchar or visa versa.. Is it possible to link to the execution plans of the slow vs fast? That might yield some useful data. – Shane Neuville Mar 25 '14 at 00:31
  • @PureWeen, I think if there was some kind of misalignment with field types the problem would show up on test server, since database is exact copy. I have since deleted and restored database in production so it's identical to two other servers that don't create bad query plan. How can I "link to execution plans"? What do you mean by that? – ricardo Mar 25 '14 at 13:23
  • I feel like the DB being a copy is only part of it. Servers are different cores vs memory size. SQL will make different choices based on that. Frequently Execute vs sp_execute performance is from parameter sniffing (should have been resolved by recompile) or things like nvarchar specified on parameters against varchar columns. The where clause is pretty small so still might be good to just verify in order to eliminate :-) Also I realize this isn't directly related to question but does removing the order by help? Sorting in DB slows down a query and query doesn't appear to need it. – Shane Neuville Mar 25 '14 at 21:31
  • Linking to Execution Plans: Execute the faster versions of the query and then the slow ones and enable "Show execution plan" http://technet.microsoft.com/en-us/library/ms178071(v=sql.105).aspx to see where time is being spent in each case. That would probably lead to some insight on differences. For example these two pans "Execute TSQL vs sp_executesql" on production to see what's different about the plans. And if possible clear the plan cache between generating the execution plans. Or you can save the plans out to XML http://technet.microsoft.com/en-us/library/ms190646.aspx – Shane Neuville Mar 25 '14 at 21:33
  • @PureWeen, in my "update" I mentioned that I did look at execution plans. The difference is two seeks vs all scans so I know why it's slow when it's slow, because of the scans. That doesn't help me because if I run the same TSQL outside of sp_executesql() the query runs fine. So why is it slow with sp_executesql(), that's part of it. Remember, I can't choose to execute without it. What is it about sp_executesql() that it uses a bad query plan even with the recompile option? – ricardo Mar 27 '14 at 00:25
  • I saw but there are more interesting details beyond some are scans and some are seeks. SQL will seek when the arguments are sargable and there's an index to use and it will scan when there isn't one or the statistics tell it otherwise. The fact that recompile doesn't work seems to indicate that it's not a bad query plan but that SQL just can't make the right choices. So I was curious to see if the nitty gritty of the plan would shed some light as to why. Like what index and argument is specifically becoming a scan over a seek? It'd be nice to know what predicate is changing in behavior. – Shane Neuville Mar 27 '14 at 07:16
  • This is sort of an example of the path of curiosity I'm going down http://www.sqlservercentral.com/Forums/Topic883230-360-1.aspx And then trying to narrow it down among these possible scenarios "almost certainly parameter sniffing on the initial query, wrong datatype, or widely disparate values leading to suboptimal plan in cache. There are ways to deal with each of those." – Shane Neuville Mar 27 '14 at 07:20

1 Answers1

10

While researching this problem I have unearthed several things about SQL Server that I didn't know. This may be common knowledge for some, but for me it wasn't. Here are my overall highlights.

  1. EF uses dynamic sql for all queries specifically sp_exectutesql(). sp_executesql() executes dynamic SQL, if you remove this SQL and execute as adhoc query in SSMS don't expect to get the same performance results. This is documented well here and references this document which I highly recommend reading if you are having these types of issues.
  2. EF5 produces different dynamic SQL than EF6 under certain conditions.
  3. It's difficult to optimize linq to entities because you may get different results depending on hardware, this is explained in the references. My original goal was to optimize the linq query when I upgraded to EF6. I noticed that not using navigation properties improved performance in my dev and test servers, but killed it in production.
  4. Final result with acceptable performance, in all environments, was a combination of join and navigation properties. In the end If I had used all navigation properties it would have worked better from the start. The join keys used were from the wrong tables, when you write ad hoc SQL it doesn't matter but it must for dynamic SQL. Had I used the navigation there would not have been any keys to get wrong. However, the best performance was with one join and the remaining navigation properties. The dynamic SQL generated is remarkably similar for all scenarios, but the SQL Server query plan optimizer gets better clues when navigation properties are used (this is a guess).

The key part of linq changed was this:

                from p1 in context.CookSales
                join p15 in context.Pins on p1.PinId equals p15.PinID
                where p1.SaleYear == userSettings.SaleYear
                where ((p1.PinId == pinId) || (pinId == null))
                orderby p1.Volume, p1.PIN
                select new SaleView bla bla

The Pins table contains the primary key for PinId while all the other tables have PinId as foreign key. Keeping Pins as join and not a navigation property improved performance.

Community
  • 1
  • 1
ricardo
  • 525
  • 5
  • 12