Which std::async implementations use thread pools?

Question

One of the advantages of using std::async instead of manually creating std::thread objects is supposed to be that std::async can use thread pools under the covers to avoid oversubscription problems. But which implementations do this? My understanding is that Microsoft's implementation does, but what about these other async implementations?

Gnu's libstdc++
LLVM's libc++
Just Software's library
Boost (for boost::thread::async, not std::async)

Thanks for any information you can offer.

Thread pools might not actually be a legal implementation due to things like `thread_local`; e.g. `async` is required to run either on the same thread or as if on a new thread, which would mean that constructors for `thread_local` variables run. VC++ doesn't implement `thread_local` yet but I'm not sure how they'll create a conformant implementation using thread pools. — bames53, Mar 27 '13 at 18:30
@bames: `thread_local` doesn't have to be a naive reference to the OS thread local storage. The runtime can keep track of thread-local objects and cause them to be destroyed/reset when an `async` worker thread is returned to the pool, as if the thread were ending. — Ben Voigt, Mar 27 '13 at 19:06
@bames53: Whether thread pools is a legal implementation is actually a separate question. (I believe the standard was written with the deliberate intent that thread pools be permitted.) My question is which implementations actually use thread pools. That question stands, even if the question of their validity remains to be established. — KnowItAllWannabe, Mar 27 '13 at 19:22

Evgeny Panasyuk · Answer 1 · 2015-04-16T00:18:13.080

Black-Box Test

Though "white box" check can be done via inspecting boost, libstdc++ or libc++ sources, or checking documentation like just::thread or MSVC Concurrency Runtime, but I can not deny myself the pleasure of writing C++11 code!

I have made black-box test:

LIVE DEMO

#define BOOST_THREAD_PROVIDES_FUTURE
#define BOOST_RESULT_OF_USE_DECLTYPE
#include <boost/exception/exception.hpp>
#include <boost/range/algorithm.hpp>
#include <boost/move/iterator.hpp>
#include <boost/phoenix.hpp>
#include <boost/thread.hpp>
#include <boost/config.hpp>

#include <unordered_set>
#include <functional>
#include <algorithm>
#include <iterator>
#include <iostream>
#include <ostream>
#include <cstddef>
#include <string>
#include <vector>
#include <future>
#include <thread>
#include <chrono>
#include <mutex>

// _____________________[CONFIGURATION]________________________ //
namespace async_lib = std;
const bool work_is_sleep = false;
// ____________________________________________________________ //

using namespace std;
using boost::phoenix::arg_names::arg1;
using boost::thread_specific_ptr;
using boost::back_move_inserter;
using boost::type_name;
using boost::count_if;
using boost::copy;

template<typename Mutex>
unique_lock<Mutex> locker(Mutex &m)
{
    return unique_lock<Mutex>(m);
}

void do_work()
{
    if(work_is_sleep)
    {
        this_thread::sleep_for( chrono::milliseconds(20) );
    }
    else
    {
        volatile double result=0.0;
        for(size_t i=0; i!=1<<22; ++i)
            result+=0.1;
    }
}

int main()
{
    typedef thread::id TID;
    typedef async_lib::future<TID> FTID;

    unordered_set<TID> tids, live_tids;
    vector<FTID> ftids;
    vector<int> live_tids_count;
    async_lib::mutex m;
    generate_n
    (
        back_move_inserter(ftids), 64*thread::hardware_concurrency(),
        [&]()
        {
            return async_lib::async([&]() -> TID
            {
                static thread_specific_ptr<bool> fresh;
                if(fresh.get() == nullptr)
                    fresh.reset(new bool(true));
                TID tid = this_thread::get_id();
                locker(m),
                    live_tids.insert(tid),
                    live_tids_count.push_back(int(live_tids.size()) * (*fresh ? -1 : 1));
                do_work();
                locker(m),
                    live_tids.erase(tid);
                *fresh = false;
                return tid;
            });
        }
    );
    transform
    (
        begin(ftids), end(ftids),
        inserter(tids, tids.end()),
        [](FTID &x){return x.get();}
    );

    cout << "Compiler = " << BOOST_COMPILER                             << endl;
    cout << "Standard library = " << BOOST_STDLIB                       << endl;
    cout << "Boost = " << BOOST_LIB_VERSION                             << endl;
    cout << "future type = " << type_name<FTID>()                       << endl;
    cout << "Only sleep in do_work = " << boolalpha << work_is_sleep    << endl;
    cout << string(32,'_')                                              << endl;
    cout << "hardware_concurrency = " << thread::hardware_concurrency() << endl;
    cout << "async count = " << ftids.size()                            << endl;
    cout << "unique thread id's = " << tids.size()                      << endl;
    cout << "live threads count (negative means fresh thread):"                ;
    copy(live_tids_count, ostream_iterator<int>(cout," "));        cout << endl;
    cout << "fresh count = " << count_if(live_tids_count, arg1 < 0)     << endl;
}

Compiler = Microsoft Visual C++ version 11.0
Standard library = Dinkumware standard library version 540
Boost = 1_53
future type = class std::future<class std::thread::id>
Only sleep in do_work = false
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 4
live threads count (negative means fresh thread):-1 -2 2 2 -3 -4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 3 4 4 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
fresh count = 4

Compiler = Microsoft Visual C++ version 11.0
Standard library = Dinkumware standard library version 540
Boost = 1_53
future type = class std::future<class std::thread::id>
Only sleep in do_work = true
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 34
live threads count (negative means fresh thread):-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 18 18 18 18 18 18 18 18 18 18 18 18 18
 18 18 18 18 -19 19 -20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 -21 21 21 -22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 -
23 23 23 23 -24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 24 -25 25 25 25 25 -26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
26 26 26 -27 27 27 27 27 27 -28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 28 -29 29 29 29 29 29 29 -30 30 30 30 30 30 30 30 30 30
 30 30 30 30 30 30 30 30 30 30 30 30 30 30 -31 31 31 31 31 31 31 31 -32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 -33 33
 33 33 33 33 33 33 33 -34 34 34 34 34 34 34 34 34 11 12 11 12 13 14 15 16 15 10 11 12 13 14
fresh count = 34

Compiler = Microsoft Visual C++ version 11.0
Standard library = Dinkumware standard library version 540
Boost = 1_53
future type = class boost::future<class std::thread::id>
Only sleep in do_work = false
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 256
live threads count (negative means fresh thread):-1 -2 -2 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -3 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -3 -4 -4 -4 -4 -4 -4 -4
 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -4 -2 -3 -3 -3 -3 -4 -4 -4 -4 -2 -2 -2 -2 -3
 -2 -2 -3 -1 -2 -2 -3 -3 -1 -2 -1 -2 -3 -2 -2 -2 -2 -1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -3 -3 -2 -2 -2 -2 -3 -3 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -3 -1 -2 -1 -2 -1 -2 -1 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -3 -3 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -3 -3 -2 -2 -3 -1 -2 -1 -2 -2 -2 -3 -2 -3 -1 -2 -2 -2 -3 -2 -3 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2
 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -3 -3 -3 -2 -2 -2 -3
fresh count = 256

Compiler = Microsoft Visual C++ version 11.0
Standard library = Dinkumware standard library version 540
Boost = 1_53
future type = class boost::future<class std::thread::id>
Only sleep in do_work = true
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 256
live threads count (negative means fresh thread):-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -2
8 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60 -61 -62 -63 -64 -65
-66 -67 -68 -69 -70 -71 -72 -73 -74 -75 -76 -77 -78 -79 -80 -81 -82 -83 -84 -85 -86 -87 -88 -89 -90 -91 -92 -93 -94 -95 -96 -97 -98 -99 -100 -101 -102
 -103 -104 -105 -106 -107 -108 -109 -110 -111 -112 -113 -114 -115 -116 -117 -118 -119 -120 -121 -122 -123 -124 -125 -126 -127 -128 -129 -130 -131 -132
 -133 -134 -135 -136 -137 -138 -139 -140 -141 -142 -143 -144 -145 -146 -147 -148 -149 -150 -151 -152 -153 -154 -155 -156 -157 -158 -159 -160 -161 -162
 -163 -164 -165 -166 -167 -168 -169 -170 -171 -172 -173 -174 -175 -176 -177 -178 -179 -180 -181 -182 -183 -184 -185 -186 -187 -188 -189 -190 -191 -192
 -193 -194 -195 -196 -197 -198 -199 -200 -201 -202 -203 -204 -205 -206 -207 -208 -209 -210 -211 -212 -213 -214 -215 -216 -217 -218 -219 -220 -221 -222
 -223 -224 -225 -226 -227 -228 -229 -230 -231 -232 -233 -234 -235 -236 -237 -238 -239 -240 -241 -242 -243 -244 -245 -246 -247 -248 -249 -250 -251 -252
 -253 -254 -255 -256
fresh count = 256

Compiler = GNU C++ version 4.8.0-alpha20121216 20121216 (experimental)
Standard library = GNU libstdc++ version 20121216
Boost = 1_53
future type = std::future<std::thread::id>
Only sleep in do_work = false
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 1
live threads count (negative means fresh thread):-1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
fresh count = 1

Compiler = GNU C++ version 4.8.0-alpha20121216 20121216 (experimental)
Standard library = GNU libstdc++ version 20121216
Boost = 1_53
future type = boost::future<std::thread::id>
Only sleep in do_work = false
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 122
live threads count (negative means fresh thread):-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -20 -21 -22 -23 -24 -25 -25 -26 -27 -28 -29 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -59 -60 -61 -62 -63 -64 -64 -64 -63 -61 -55 -48 -41 -33 -34 -27 -27 -28 -29 -29 -30 -31 -31 -30 -28 -29 -28 -28 -28 -29 -30 -30 -31 -30 -31 -31 -31 -32 -31 -26 -24 -25 -26 -25 -23 -23 -22 -20 -21 -19 -20 -20 -19 -20 -21 -21 -22 -21 -22 -23 -24 -24 -25 -26 -27 -25 -25 -25 -22 -23 -22 -23 -23 -24 -25 -26 -27 -27 -28 -29 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -51 -52 -53 -54 -55 -56 -57 -56 -54 -3 -3 -4 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -55 -56 -57 -58 -59 -60 -61 -61 -61 -62 -63 -64 -65 -64 -62 -63 -63 -61 -62 -61 -59 -60 -59 -57 -55 -56
fresh count = 256

Compiler = Clang version 3.2 (tags/RELEASE_32/final)
Standard library = libc++ version 1101
Boost = 1_53
future type = NSt3__16futureINS_11__thread_idEEE
Only sleep in do_work = false
________________________________
hardware_concurrency = 4
async count = 256
unique thread id's = 255
live threads count (negative means fresh thread):-1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60 -61 -62 -63 -64 -65 -66 -67 -68 -69 -69 -70 -71 -72 -73 -74 -75 -76 -77 -78 -79 -80 -81 -82 -83 -84 -85 -86 -87 -88 -89 -90 -91 -92 -93 -94 -95 -96 -97 -98 -99 -100 -101 -102 -102 -103 -104 -105 -106 -107 -108 -109 -110 -111 -112 -113 -114 -115 -116 -117 -118 -119 -120 -121 -121 -122 -122 -123 -123 -124 -125 -126 -127 -126 -127 -128 -129 -129 -128 -129 -128 -129 -129 -128 -129 -129 -128 -126 -127 -128 -128 -129 -129 -130 -129 -129 -128 -129 -129 -130 -131 -132 -133 -134 -135 -136 -134 -132 -133 -134 -134 -133 -132 -133 -132 -133 -134 -133 -131 -129 -127 -124 -125 -121 -119 -120 -118 -119 -118 -117 -115 -111 -107 -105 -106 -103 -100 -97 -95 -96 -94 -90 -87 -81 -73 -74 -71 -72 -73 -74 -75 -70 -71 -66 -60 -59 -60 -61 -62 -63 -64 -61 -58 -55 -55 -52 -53 -54 -54 -55 -56 -56 -57 -54 -55 -56 -56 -57 -57 -58 -56 -54 -55 -56 -56 -57 -58 -58 -59 -58 -58 -58 -58 -59 -60
fresh count = 256

As can be seen - MSVC does reuse threads, i.e. it is very likely that it's scheme is some kind of thread pool.

This is interesting, but I'm not sure what we can conclude from it. For example, if we see thread ID n being used more than once, that could indicate that that thread was part of a pool, but it could also indicate that a thread with that ID was started, ran to completion, was destroyed, and then the ID was reused by a new thread. Or am I misreading the code or mis-analyzing the situation? — KnowItAllWannabe, Apr 03 '13 at 04:17
@KnowItAllWannabe, what you are saying is possible, and I thought about such situation. I am also trying to make some conclusion based on these results. Let's start with definitely obvious: libc++, libstdc++, and Boost+{GCC,MSVC} - do not use thread pools. Only MSVC *possibly* uses thread pool. Do you agree? — Evgeny Panasyuk, Apr 03 '13 at 10:54
MSVC's `std::async` uses PPL under the hood, as far as I'm aware, and that has a work-stealing runtime with a thread pool (IIRC). — Xeo, Apr 08 '13 at 14:48
@Xeo - yes, that why I added reference to [MSVC Concurrency Runtime](http://msdn.microsoft.com/en-us/library/dd984036.aspx) at top - it says "The Task Scheduler uses cooperative scheduling and a work-stealing algorithm together with the preemptive scheduler of the operating system to achieve maximum usage of processing resources." — Evgeny Panasyuk, Apr 08 '13 at 14:56
@KnowItAllWannabe: That is possible but doesn't agree with the observations. If the OS were aggressively reusing thread IDs, then the other implementations would be very likely to see at least some reuse as well. — Ben Voigt, Sep 12 '14 at 18:34
You should be calling `async_lib::async(async_lib::launch::async, []() ...)` to ensure the function isn't deferred, since creating a deferred function tells you nothing useful. (With all current releases of GCC omitting the launch policy causes the function to be deferred.) — Jonathan Wakely, Sep 12 '14 at 18:43

Which std::async implementations use thread pools?

1 Answers1

Black-Box Test

Linked