I'm using Concurrency::parallel_for()
of Visual Studio 2010's Parallel Patterns Library (PPL) to process an indexed set of tasks (typically, the index set is much larger than the number of threads that can run simultaneously). Each task, before doing a lengthy calculation, starts by requesting a private working storage resource from a shared resource manager (in case: a view on a task specific memory mapped file, but I think the story-line would be the same if each task requested a private memory allocation from a shared heap).
The usage of the shared resource manager is synchronized with a Concurrency::critical_section
and here the problem starts: If a first thread/task is in the critical section and a second task makes a request, it has to wait until the first task's request is handled. The PPL apparently then thinks: hey this thread is waiting and there are more tasks to do, hence another thread is created causing up to 870 threads mostly waiting at the same resource manager.
Now as handling the resource request is only a small part of the whole task, I would like to tell the PPL at that part to hold its horses, none of the waits or cooperative blocks should cause new threads to start from an indicated section of a working-thread and my question here is: if and how I can prevent a specific thread section to create new threads, even if it cooperatively blocks. I wouldn't mind new threads to be created at other blocks further down the thread's processing path, but no more than say 2* the number of (hyper)cores.
Alternatives that I have considered so far:
Queue-up tasks and process the queue from a limited number of threads. Issue: I hoped, PPL's parallel_for would do that by itself.
Define a
Concurrency::combinable<Resource> resourceSet
; outside theConcurrency::parallel_for
and initializeresourceSet.local()
once to reduce the number of resource requests (by reusing the resources) to the number of threads (which should be less than the number of tasks). Issue: this optimization doesn't prevent the superfluous thread creation.Pre allocate the required resources for each task outside the
parallel_for
loop. Issue: this would request too many system resources whereas limiting the amount of resources to the number of threads/cores would be OK (if that didn't explode).
I read http://msdn.microsoft.com/en-us/library/ff601930.aspx, section "Do Not Block Repeatedly in a Parallel Loop", but following the advice here would result in no parallel threads at all.