38

I have a .NET Core 1.1 API with EF Core 1.1 and using Microsoft's vanilla setup of using Dependency Injection to provide the DbContext to my services. (Reference: https://docs.microsoft.com/en-us/aspnet/core/data/ef-mvc/intro#register-the-context-with-dependency-injection)

Now, I am looking into parallelizing database reads as an optimization using WhenAll

So instead of:

var result1 = await _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);
var result2 = await _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp); 

I use:

var repositoryTask1 = _dbContext.TableModel1.FirstOrDefaultAsync(x => x.SomeId == AnId);     
var repositoryTask2 = _dbContext.TableModel2.FirstOrDefaultAsync(x => x.SomeOtherProp == AProp);   
(var result1, var result2) = await (repositoryTask1, repositoryTask2 ).WhenAll();

This is all well and good, until I use the same strategy outside of these DB Repository access classes and call these same methods with WhenAll in my controller across multiple services:

var serviceTask1 = _service1.GetSomethingsFromDb(Id);
var serviceTask2 = _service2.GetSomeMoreThingsFromDb(Id);
(var dataForController1, var dataForController2) = await (serviceTask1, serviceTask2).WhenAll();

Now when I call this from my controller, randomly I will get concurrency errors like:

System.InvalidOperationException: ExecuteReader requires an open and available Connection. The connection's current state is closed.

The reason I believe is because sometimes these threads try to access the same tables at the same time. I know that this is by design in EF Core and if I wanted to I could create a new dbContext every time, but I am trying to see if there is a workaround. That's when I found this good post by Mehdi El Gueddari: http://mehdi.me/ambient-dbcontext-in-ef6/

In which he acknowledges this limitation:

an injected DbContext prevents you from being able to introduce multi-threading or any sort of parallel execution flows in your services.

And offers a custom workaround with DbContextScope.

However, he presents a caveat even with DbContextScope in that it won't work in parallel (what I'm trying to do above):

if you attempt to start multiple parallel tasks within the context of a DbContextScope (e.g. by creating multiple threads or multiple TPL Task), you will get into big trouble. This is because the ambient DbContextScope will flow through all the threads your parallel tasks are using.

His final point here leads me to my question:

In general, parallelizing database access within a single business transaction has little to no benefits and only adds significant complexity. Any parallel operation performed within the context of a business transaction should not access the database.

Should I not be using WhenAll in this case in my Controllers and stick with using await one-by-one? Or is dependency-injection of the DbContext the more fundamental problem here, therefore a new one should instead be created/supplied every time by some kind of factory?

starmandeluxe
  • 2,029
  • 3
  • 22
  • 39
  • 2
    What is `Repository1`? If you're using EF you don't need to use the Repository (Anti-)Pattern because that's what your `dbContext` is. You can have a class that has methods that generate queries, sure, but that isn't a repository, it's just a wrapper that provides queries. – Dai May 22 '17 at 00:26
  • It sounds like you're returning a `Task` directly from within a `using(){}` block - instead use an `await` inside a `using(){}` block - this will not affect the ability to parallelize your code. – Dai May 22 '17 at 00:27
  • Nowhere do I indicate that I have a using block. In fact, I explicitly say I am using DI. Please read the links I have referenced. – starmandeluxe May 22 '17 at 00:46
  • I have updated the name of the DbContext's "repository" so it is less confusing that it's not some kind of repository class, but referring to a DbSet of the model class – starmandeluxe May 24 '17 at 03:29

2 Answers2

37

Using any context.XyzAsync() method is only useful if you either await the called method or return control to a calling thread that's doesn't have context in its scope.

A DbContext instance isn't thread-safe: you should never ever use it in parallel threads. Which means, just for sure, never use it in multiple threads anyway, even if they don't run parallel. Don't try to work around it.

If for some reason you want to run parallel database operations (and think you can avoid deadlocks, concurrency conflicts etc.), make sure each one has its own DbContext instance. Note however, that parallelization is mainly useful for CPU-bound processes, not IO-bound processes like database interaction. Maybe you can benefit from parallel independent read operations but I would certainly never execute parallel write processes. Apart from deadlocks etc. it also makes it much harder to run all operations in one transaction.

In ASP.Net core you'd generally use the context-per-request pattern (ServiceLifetime.Scoped, see here), but even that can't keep you from transferring the context to multiple threads. In the end it's only the programmer who can prevent that.

If you're worried about the performance costs of creating new contexts all the time: don't be. Creating a context is a light-weight operation, because the underlying model (store model, conceptual model + mappings between them) is created once and then stored in the application domain. Also, a new context doesn't create a physical connection to the database. All ASP.Net database operations run through the connection pool that manages a pool of physical connections.

If all this implies that you have to reconfigure your DI to align with best practices, so be it. If your current setup passes contexts to multiple threads there has been a poor design decision in the past. Resist the temptation to postpone inevitable refactoring by work-arounds. The only work-around is to de-parallelize your code, so in the end it may even be slower than if you redesign your DI and code to adhere to context per thread.

Gert Arnold
  • 93,904
  • 24
  • 179
  • 256
  • 1
    Thanks. Unfortunately, all this information that you've given is stuff that I am already aware of and/or have stated in my question. Specifically, when you say "If for some reason you want to run parallel database operations", I would like to know what reasons, if any, there would be to do this (which seems to depart from Microsoft's recommended examples). This is because I am trying to AVOID creating a new context every time as I stated in my question, as it will incur not only possible memory costs but a redesign of the Dependency Injection code as well. – starmandeluxe May 23 '17 at 05:32
  • 2
    *I would like to know what reasons* Well, you bring it up, so I guess you have your reasons. I tried to elaborate some more, but that doesn't change much, I'm afraid. – Gert Arnold May 23 '17 at 08:40
  • Actually I don't have any good reasons. I don't feel it is necessary to make this part of my question, but it is not as simple as you suggest: I have a colleague who thinks the "only correct way" to access the DB is to stick with parallel-only db access, presumably for performance reasons. This question is in part to explore what is the best practice for such cases, as I have doubts that it is a best practice at all. – starmandeluxe May 24 '17 at 00:53
  • When you say "even slower", do you mean that the API's performance using DI will be slower than creating new DbContexts in each access call? If so, why would Microsoft recommend this strategy in their official docs? – starmandeluxe May 24 '17 at 02:15
  • 2
    No, I mean that de-parallelizing code that's designed to run parallel may get slower than necessary because it may also remove the possibility to parallel when it's not harmful. As for "the only correct way", that's quite absurd. It's common knowledge that the effect of parallelizing IO-bound processes is limited. It *is* simple: if architecture allows contexts to be shared by multiple threads the design is flawed and should be refactored. – Gert Arnold May 24 '17 at 09:21
  • I'm a bit confused here, when adding an API controller in Visual Studio, it generates all async and Task<> methods.. Based on this article, I've decided it's best to go to them all and remove async, Task<>, and await designations. Is this what you're saying? It doesn't let me just make my DAO implementation synchronous, well, it does ALLOW me to do it, but it warns that it will run synchronously in the IDE on the controller, I guess because everything called isn't async? – Dan Chase Aug 14 '19 at 20:18
  • @DanChase No, because each action method will have its own controller instance and, hence, request scope and context (if you do it right). Just don't fan out into multiple threads off of the action method. – Gert Arnold Aug 14 '19 at 20:25
  • an old thread, but i'd love it if you could shed some light on how the implementation details differ from the perception here. If all i'm doing is reading data and i want to pull records from different tables to aggregate into a master payload it seems to me that by creating a task for each query and using waitall() to send them ought to rely on >1 (pooled) connection and so the db would respond to each as if they were separate clients, and hence faster as compared to serialized queries? – tntwyckoff Jul 06 '20 at 21:36
33

It came to the point where really the only way to answer the debate was to do a performance/load test to get comparable, empirical, statistical evidence so I could settle this once and for all.

Here is what I tested:

Cloud Load test with VSTS @ 200 users max for 4 minutes on a Standard Azure webapp.

Test #1: 1 API call with Dependency Injection of the DbContext and async/await for each service.

Results for Test #1:enter image description here

Test #2: 1 API call with new creation of the DbContext within each service method call and using parallel thread execution with WhenAll.

Results for Test #2:enter image description here

Conclusion:

For those who doubt the results, I ran these tests several times with varying user loads, and the averages were basically the same every time.

The performance gains with parallel processing in my opinion is insignificant, and this does not justify the need for abandoning Dependency Injection which would create development overhead/maintenance debt, potential for bugs if handled wrong, and a departure from Microsoft's official recommendations.

One more thing to note: as you can see there were actually a few failed requests with the WhenAll strategy, even when ensuring a new context is created every time. I am not sure the reason for this, but I would much prefer no 500 errors over a 10ms performance gain.

starmandeluxe
  • 2,029
  • 3
  • 22
  • 39
  • Nice one! What kind of db calls did you test here? – Gert Arnold May 24 '17 at 06:55
  • 2
    Sorry, it's kind of confidential information within my company...I can tell you it is accessing multiple tables across multiple services with various kinds of complex queries and aggregations of the results. See my question for examples of the syntax I am using. – starmandeluxe May 24 '17 at 07:49
  • OK, I understand, but does it also include writes? – Gert Arnold May 24 '17 at 09:10
  • No. Queries only. – starmandeluxe May 25 '17 at 00:49
  • FYI, the 500 errors were due to using WhenAll across multiple queries within the new DbContext using statement, so I had to change it to await here instead. Apparently, even if you have a new context, you cannot parallelize DB access in this manner. WhenAll will work fine on the controller level which doesn't work within a DbContext scope. Finally, it did not seem to affect performance whatsoever to change it to await here, the averages were 164ms for my API call. To conclude, WhenAll seems to be OK with DI (for some reason), but not within a context scope in a using statement. – starmandeluxe May 30 '17 at 01:40
  • 3
    This is your result and should not be the answer. Others may get other results. Some DB types are optimized for heavily distributed queries and has a high latency. Network conditions such as latency or retry, database locking, overloaded database, large queries, etc can all cause latency. When the users are using the app daily you may want the app to be as fast and responsive as possible. – Tedd Hansen Mar 06 '20 at 09:18