Concurrency::parallel_for (PPL) is creating too many threads

Question

I'm using Concurrency::parallel_for() of Visual Studio 2010's Parallel Patterns Library (PPL) to process an indexed set of tasks (typically, the index set is much larger than the number of threads that can run simultaneously). Each task, before doing a lengthy calculation, starts by requesting a private working storage resource from a shared resource manager (in case: a view on a task specific memory mapped file, but I think the story-line would be the same if each task requested a private memory allocation from a shared heap).

The usage of the shared resource manager is synchronized with a Concurrency::critical_section and here the problem starts: If a first thread/task is in the critical section and a second task makes a request, it has to wait until the first task's request is handled. The PPL apparently then thinks: hey this thread is waiting and there are more tasks to do, hence another thread is created causing up to 870 threads mostly waiting at the same resource manager.

Now as handling the resource request is only a small part of the whole task, I would like to tell the PPL at that part to hold its horses, none of the waits or cooperative blocks should cause new threads to start from an indicated section of a working-thread and my question here is: if and how I can prevent a specific thread section to create new threads, even if it cooperatively blocks. I wouldn't mind new threads to be created at other blocks further down the thread's processing path, but no more than say 2* the number of (hyper)cores.

Alternatives that I have considered so far:

Queue-up tasks and process the queue from a limited number of threads. Issue: I hoped, PPL's parallel_for would do that by itself.
Define a Concurrency::combinable<Resource> resourceSet; outside the Concurrency::parallel_for and initialize resourceSet.local() once to reduce the number of resource requests (by reusing the resources) to the number of threads (which should be less than the number of tasks). Issue: this optimization doesn't prevent the superfluous thread creation.
Pre allocate the required resources for each task outside the parallel_for loop. Issue: this would request too many system resources whereas limiting the amount of resources to the number of threads/cores would be OK (if that didn't explode).

I read http://msdn.microsoft.com/en-us/library/ff601930.aspx, section "Do Not Block Repeatedly in a Parallel Loop", but following the advice here would result in no parallel threads at all.

I just found http://stackoverflow.com/questions/9990363/thread-ids-with-ppl-and-parallel-memory-allocation?rq=1 which is very similar to this question. — Maarten Hilferink, Oct 02 '14 at 14:51

score 4 · Accepted Answer · edited May 23 '17 at 12:30

4

I do not know if it is possible to configure PPL/ConcRT to not use cooperative synchronization or at least to put the limit on the number of threads it creates. I thought it might be controlled via scheduler policies, but seemingly none of the policy parameters suits for the purpose.

However I have some suggestions you might find useful to mitigate the problem, even if not in the ideal way:

Instead of critical_section, use a non-cooperative synchronization primitive to protect the resource manager. I think (though did not check) that the classical WinAPI CRITICAL_SECTION should succeed. As a radical step in this direction, you may consider other parallel libraries for your code; e.g. Intel's TBB provides most of PPL API and has more (disclaimer: I'm affiliated with it).
Pre-allocate a number of resources outside the parallel loop. One resource per task is not necessary; one per thread should be sufficient. Put these resources into a concurrent_queue, and inside a task pop a resource from the queue, use, and then push it back. Also, instead of returning the resource to the queue a thread might hoard it inside a combinable object for reuse in other tasks. If the queue happens to be empty (e.g. if PPL oversubscribes the machine), there might be different approaches, e.g. spinning in a loop until some other thread returns a resource, or requesting another resource from the manager. Also you may choose to pre-allocate more resources than the number of threads to minimize chances for resource exhaustion.

edited May 23 '17 at 12:30

Community

1
1

answered Sep 30 '14 at 21:27

Alexey Kukanov

11,673
1
32
54

1

Thanks, I will consider and test non-cooperative sync (although my resource manager is also used in other contexts in which cooperative sync is desired), TBB, and also boost::thread. – Maarten Hilferink Oct 01 '14 at 22:55
1

I think popping resources from a queue inside a task would also start new threads when the queue is empty and (cooperatively) blocking and other tasks are still waiting to start. – Maarten Hilferink Oct 01 '14 at 22:57
`concurrent_queue` has no blocking pop method, only `try_pop()` which immediately returns if the queue is empty (see http://msdn.microsoft.com/en-us/library/ee355358.aspx). So you can e.g. make a spinning loop that calls `try_pop()` until success. – Alexey Kukanov Oct 02 '14 at 07:58
1

So if I migrated to TBB, how could I then prevent a specific thread section (a part from the lambda function that I would provide to parallel_for) to create new threads, even if it cooperatively blocks? – Maarten Hilferink Oct 02 '14 at 14:05
1

In TBB cooperative synchronization is not used. – Alexey Kukanov Oct 02 '14 at 14:06
1

In the case of undersubscription due to a lot of blocking in parallel tasks, we recommend users to initialize TBB with more worker threads. Also, since synchronization is not cooperative in TBB and the task scheduler does not know of a lock being held, it's not recommended to use a nested parallel construct inside a critical section. So there might be some issues if your resource manager uses parallelism inside. – Alexey Kukanov Oct 02 '14 at 14:14

score 1 · Answer 2 · edited May 23 '17 at 12:13

1

My answer is not "the" solution using PPL, but I think you could do it so easily with a thread pool like taskqueue, you should have a look to this answer.

So you fill up the queue with your works, it ensures that there will be no more than 'x' tasks working in parallel, where x is boost::thread::hardware_concurrency() (yes boost again ...)

edited May 23 '17 at 12:13

Community

1
1

answered Oct 01 '14 at 14:26

Jean Davy

1,628
17
22

1

Thanks, I'll have another look at boost::threads although it doesn't seem to have a parallel_for, so I would have to split up a task set into a limited set of task ranges myself. – Maarten Hilferink Oct 02 '14 at 14:01

Concurrency::parallel_for (PPL) is creating too many threads

2 Answers2