11

Possible Duplicate:
Parallel.ForEach vs Task.Factory.StartNew

I need to run about 1,000 tasks in a ThreadPool on a nightly basis (the number may grow in the future). Each task is performing a long running operation (reading data from a web service) and is not CPU intensive. Async I/O is not an option for this particular use case.

Given an IList<string> of parameters, I need to DoSomething(string x). I am trying to pick between the following two options:

IList<Task> tasks = new List<Task>();
foreach (var p in parameters)
{
    tasks.Add(Task.Factory.StartNew(() => DoSomething(p), TaskCreationOptions.LongRunning));
}
Task.WaitAll(tasks.ToArray());

OR

Parallel.ForEach(parameters, new ParallelOptions {MaxDegreeOfParallelism = Environment.ProcessorCount*32}, DoSomething);

Which option is better and why?

Note :

The answer should include a comparison between the usage of TaskCreationOptions.LongRunning and MaxDegreeOfParallelism = Environment.ProcessorCount * SomeConstant.

Community
  • 1
  • 1
Zaid Masud
  • 12,359
  • 8
  • 62
  • 84
  • Possible duplicate: http://stackoverflow.com/q/5009181/21727 – mbeckish May 21 '12 at 15:18
  • Running 1,000 threads will consume almost 1 gigabyte of virtual memory with a default stack size of 1 megabyte. Obviously, using TPL or PLINQ you will not be running that many threads, but I just wanted to point out the fact that a thread even though it is sleeping has a cost to the process. – Martin Liversage May 21 '12 at 15:27
  • @mbeckish not a duplicate -- other question does not wait all on Tasks, nor does it specify LongRunning / MaxDegreeOfParallelism. It's a fundamentally different problem. – Zaid Masud May 21 '12 at 16:07
  • @MartinLiversage yes, if they all run concurrently which we clearly don't want. I am looking for an implementation that will give me the best performance given the problem without changing the problem. – Zaid Masud May 21 '12 at 16:08
  • 1
    @zooone9243: Perhaps I'm nitpicking a bit and it is just a language thing, but on a 4 core machine only 4 threads run concurrently. You can still create 1,000 threads (unless you run of of memory space) but a proper solution is to use threads from a pool __exactly as you intend to do__. It is the _I need to run about 1,000 threads_ part that confuses me. – Martin Liversage May 21 '12 at 16:18
  • @MartinLiversage I see your point. I have updated the text, hope that it is clearer? Thanks for pointing it out. – Zaid Masud May 21 '12 at 16:21
  • Can't you use asynchronous I/O and store state in some objects instead of having thousands of threads hanging around that most of the time do nothing but take up resources? – Wormbo May 21 '12 at 16:55
  • @Wormbo thanks for the suggestion but without getting into too many details sorry async I/O is not an option for this particular case. – Zaid Masud May 21 '12 at 17:00
  • 10,000 threads seems pretty unworkable to me. Even with `TaskCreationOptions.LongRunning` the task schedular will not allocate threads to all those tasks at once. You will end up with only a subset running with the others waiting until they are finished. Is there no way you can refactor your code? – GazTheDestroyer May 21 '12 at 15:14
  • @zooone9243 do you realize that 10000 threads would take up 10gb of memory in stack space alone? Way more if your threads are doing work with objects. I think GazTheDestroyer is right, you may need to find a way to refactor your code. Using async io for sockets throws the work on the kernal, and you can at least save memory by having far less threads. – Christopher Currens May 21 '12 at 15:23
  • @ChristopherCurrens et all no need to get hung up or carried away on this point. It is distracting the attention from the real question. I am in no way implying that 10,000 threads need to run concurrently. Of course they can be partitioned and of course they need to effectively use the ThreadPool, the whole purpose of which is not to take up 10GB in stack space. – Zaid Masud May 21 '12 at 15:29
  • Guys there is some documentation that I am finding that implies that the first option may actually create 1,000 threads but I haven't been able to fully verify it... see here http://stackoverflow.com/questions/3105988/taskcreationoptions-longrunning-option-and-threadpool. If this is true that would rule out the first option for my use case. – Zaid Masud May 21 '12 at 17:51
  • 3
    Please don't tell us that async I/O is not an option without explaining it. This sounds like an [X/Y problem](http://meta.stackexchange.com/q/66377/141911) if I've ever heard one. Async I/O is **the** correct way to perform these types of tasks. If you're sure it doesn't apply in your case, then **explain** your problem so that we can actually try to provide the best possible solution. – Aaronaught May 21 '12 at 18:07
  • @Aaronaught in a nutshell the web service wrapper that we have to use in this business context doesn't support async I/O. – Zaid Masud May 21 '12 at 18:11
  • 2
    Then modify it so that it does. This is an architecture problem, not a performance concern. You will not get acceptable performance using the TPL (which includes both `Task` and `Parallel`). At best you're asking to choose between the lesser of two *grave* evils. – Aaronaught May 21 '12 at 18:16
  • 1
    Just an aside I realize this is an old question: Task/Parallel are 4.0 features. async is a 4.5 feature (yes I realize there was a CTP). So it could just be mandated from God that only true 4.0 features will be in the code. Or as the questioner mentions a web services library which must be used might be third party and there is no ability to modify it and decorate everything with async/awaits everywhere. – Mike Mar 13 '13 at 21:53
  • 3
    Something to note here - if you use Parallel.ForEach on long running (I/O bound tasks), the thread scheduler gets impatient. It assumes that the reason for the slow progress is that tasks are overly CPU intensive, so it starts adding threads to the thread pool at a rate of 2/minute. This basically "leaks" threads in this manner until the parallel foreach is complete. – Steven Padfield Mar 19 '13 at 02:48

3 Answers3

36

Perhaps you aren't aware of this, but the members in the Parallel class are simply (complicated) wrappers around Task objects. In case you're wondering, the Parallel class creates the Task objects with TaskCreationOptions.None. However, the MaxDegreeOfParallelism would affect those task objects no matter what creation options were passed to the task object's constructor.

TaskCreationOptions.LongRunning gives a "hint" to the underlying TaskScheduler that it might perform better with oversubscription of the threads. Oversubscription is good for threads with high-latency, for example I/O, because it will assign more than one thread (yes thread, not task) to a single core so that it will always have something to do, instead of waiting around for an operation to complete while the thread is in a waiting state. On the TaskScheduler that uses the ThreadPool, it will run LongRunning tasks on their own dedicated thread (the only case where you have a thread per task), otherwise it will run normally, with scheduling and work stealing (really, what you want here anyway)

MaxDegreeOfParallelism controls the number of concurrent operations run. It's similar to specifying the max number of paritions that the data will be split into and processed from. If TaskCreationOptions.LongRunning were able to be specified, all this would do would be to limit the number of tasks running at a single time, similar to a TaskScheduler whose maximum concurrency level is set to that value, similar to this example.

You might want the Parallel.ForEach. However, adding MaxDegreeOfParallelism equal to such a high number actually won't guarantee that there will be that many threads running at once, since the tasks will still be controlled by the ThreadPoolTaskScheduler. That scheduler will the number of threads running at once to the smallest amount possible, which I suppose is the biggest difference between the two methods. You could write (and specify) your own TaskScheduler that would mimic the max degree of parallelism behavior, and have the best of both worlds, but I'm doubting that something you're interested in doing.

My guess is that, depending on latency and the number of actual requests you need to do, using tasks will perform better in many(?) cases, though wind up using more memory, while parallel will be more consistent in resource usage. Of course, async I/O will perform monstrously better than any of these two options, but I understand you can't do that because you're using legacy libraries. So, unfortunately, you'll be stuck with mediocre performance no matter which one of those you chose.

A real solution would be to figure out a way to make async I/O happen; since I don't know the situation, I don't think I can be more helpful than that. Your program (read, thread) will continue execution, and the kernel will wait for the I/O operation to complete (this is also known as using I/O completion ports). Because the thread is not in a waiting state, the runtime can do more work on less threads, which usually ends up in an optimal relationship between the number of cores and number of threads. Adding more threads, as much as I wish it would, does not equate to better performance (actually, it can often hurt performance, because of things like context switching).

However, this entire answer is useless in a determining a final answer for your question, though I hope it will give you some needed direction. You won't know what performs better until you profile it. If you don't try them both (I should clarify that I mean the Task without the LongRunning option, letting the scheduler handle thread switching) and profile them to determine what is best for your particular use case, you're selling yourself short.

Christopher Currens
  • 27,427
  • 5
  • 51
  • 74
  • Thanks for a great answer. I wonder why, if Parallel class creates Task objects, how come it is able to create foreground threads vs Task library which creates background threads and doesn't seem to give you the option to create foreground threads? – Zaid Masud May 22 '12 at 13:46
  • @zooone9243 - Its not actually creating foreground threads. Instead, it just calls Wait() which blocks execution until its finished or canceled. – Christopher Currens May 22 '12 at 14:35
  • @zooone9243 - It's a little bit more complicated than I'm making it out to be. If you want to get a good understanding of the inner workings, I'd recommend you check out the [.NET Reference Source](http://referencesource.microsoft.com/) – Christopher Currens May 22 '12 at 16:34
4

Both options are entirely inappropriate for your scenario.

TaskCreationOptions.LongRunning is certainly a better choice for tasks that are not CPU-bound, as the TPL (Parallel classes/extensions) are almost exclusively meant for maximizing the throughput of a CPU-bound operation by running it on multiple cores (not threads).

However, 1000 tasks is an unacceptable number for this. Whether or not they're all running at once isn't exactly the issue; even 100 threads waiting on synchronous I/O is an untenable situation. As one of the comments suggests, your application will be using an enormous amount of memory and end up spending almost all of its time in context-switching. The TPL is not designed for this scale.

If your operations are I/O bound - and if you are using web services, they are - then async I/O is not only the correct solution, it's the only solution. If you have to re-architect some of your code (such as, for example, adding asynchronous methods to major interfaces where there were none originally), do it, because I/O completion ports are the only mechanism in Windows or .NET that can properly support this particular type of concurrency.

I've never heard of a situation where async I/O was somehow "not an option". I cannot even conceive of any valid use case for this constraint. If you are unable to use async I/O then this would indicate a serious design problem that must be fixed, ASAP.

Aaronaught
  • 115,846
  • 24
  • 251
  • 329
  • "I've never heard of a situation where async I/O was somehow 'not an option'" ... each web service call requires an expensive handshake. Establishing a connection is the real killer, almost more so than the actual calls themselves. My knowledge of IO completion ports is limited, can they be used for this scenario? If you have any good references on these please share. Thanks. – Zaid Masud May 21 '12 at 18:58
  • @zooone9243, I don't see why that would have to mean you can't use async IO. And it's hard to tell you how exactly to do that unless your tell us more. – svick May 21 '12 at 19:01
  • @svick I need to learn more about these I/O completion ports... are we talking about using unmanaged Windows I/O threads as discussed here? http://blogs.msdn.com/b/ericeil/archive/2008/06/20/windows-i-o-threads-vs-managed-i-o-threads.aspx – Zaid Masud May 21 '12 at 19:05
  • @zooone9243, no, it means using `BeginXxx()`/`EndXxx()` methods instead of just `Xxx()` method. Which method(s) exactly are we talking about depends on what exactly you are doing (they can be methods on `WebRequest` or `Socket` or maybe something else). The `Begin`/`End` methods then use I/O completion ports internally. – svick May 21 '12 at 19:12
  • @zooone9243: Perhaps I've misinterpreted the meaning of "web service" here, but in my experience, the entire premise of a web service is that it uses a standard web protocol and format (i.e. SOAP, XML, JSON, all over HTTP or HTTPS) which you *don't* need a proprietary library to access. Is this some totally opaque binary-encoded RPC service for which you have no source or specifications? – Aaronaught May 21 '12 at 19:42
4

While this is not a direct comparison, I think it may help you. I do something similar to what you describe (in my case I know there is a load balanced server cluster on the other end serving REST calls). I get good results using Parrallel.ForEach to spin up an optimal number of worker threads provided that I also use the following code to tell my operating system it can connect to more than usual number of endpoints.

    var servicePointManager = System.Net.ServicePointManager.FindServicePoint(Uri);
    servicePointManager.ConnectionLimit = 250;

Note you have to call that once for each unique URL you connect to.

Aaron Anodide
  • 16,154
  • 14
  • 59
  • 117