23

I notice that vector is much slower than bool array when running the following code.

int main() 
{
    int count = 0;
    int n = 1500000;
    // slower with c++ vector<bool>
    /*vector<bool> isPrime;
    isPrime.reserve(n);
    isPrime.assign(n, true);
    */
    // faster with bool array 
    bool* isPrime = new bool[n];

    for (int i = 0; i < n; ++i)
        isPrime[i] = true;


    for (int i = 2; i< n; ++i) {
        if (isPrime[i])
            count++;
        for (int j =2; i*j < n; ++j )
            isPrime[i*j] = false;
    }

    cout <<  count << endl;
    return 0;
}

Is there some way that I can do to make vector<bool> faster ? Btw, both std::vector::push_back and std::vector::emplace_back are even slower than std::vector::assign.

manlio
  • 16,658
  • 13
  • 67
  • 107
guoqing
  • 249
  • 2
  • 5

3 Answers3

16

std::vector<bool> can have various performance issues (e.g. take a look at https://isocpp.org/blog/2012/11/on-vectorbool).

In general you can:

  • use std::vector<std::uint8_t> instead of std::vector<bool> (give a try to std::valarray<bool> also).

    This requires more memory and is less cache-friendly but there isn't a overhead (in the form of bit manipulation) to access a single value, so there are situations in which it works better (after all it's just like your array of bool but without the nuisance of memory management)

  • use std::bitset if you know at compile time how large your boolean array is going to be (or if you can at least establish a reasonable upper bound)
  • if Boost is an option try boost::dynamic_bitset (the size can be specified at runtime)

But for speed optimizations you have to test...

With your specific example I can confirm a performance difference only when optimizations are turned off (of course this isn't the way to go).

Some tests with g++ v4.8.3 and clang++ v3.4.5 on an Intel Xeon system (-O3 optimization level) give a different picture:

                    time (ms)
                 G++      CLANG++
array of bool    3103     3010
vector<bool>     2835     2420    // not bad!
vector<char>     3136     3031    // same as array of bool
bitset           2742     2388    // marginally better

(time elapsed for 100 runs of the code in the answer)

std::vector<bool> doesn't look so bad (source code here).

manlio
  • 16,658
  • 13
  • 67
  • 107
  • 1
    "Xeon" can be anything from P4 to Skylake. Saying *which* Xeon (e.g. Haswell Xeon, or Exxxx v3) would be much more informative. Those are pretty old compiler versions for modern hardware, too (not as big a deal if you aren't auto-vectorizing or using `-march=native`). – Peter Cordes Apr 30 '16 at 18:54
  • 1
    @PeterCordes you're right. It's a Xeon e3-1230v3 (using `-march=native` switch). In my defence I would add that it wasn't meant to be a complete examination, but I'll add add test code and some more results as soon as I can. – manlio Apr 30 '16 at 20:50
  • You don't have to make a big deal out of timing this. Just any time you post any perf numbers, the specific microarch and compiler version + options are essential. e.g. clang++ 3.4.5 on a Haswell Xeon, `-O3 -march=native` would cover it. BTW, clang 3.7.1 (current stable) auto-vectorizes for AVX2 significantly better than 3.4, in terms of the quality of the asm. IDK how often that makes a perf diff, since memory bottlenecks often hide CPU bottlenecks. http://llvm.org/apt/ has current versions of clang for debian-based Linux distros. – Peter Cordes Apr 30 '16 at 22:29
  • I confirm that the performance benefit of `std::vector` is not only observed on x86 processors, but also on my raspberry pi with gcc 8.3, with just a simple `-O2` switch. It's like 0.5s vs. 2.5s difference, with `std::vector` being 0.5s. – Samuel Li Jan 01 '20 at 21:34
10

vector<bool> may have a template specialization and may be implemented using bit array to save space. Extracting and saving a bit and converting it from / to bool may cause the performance drop you are observing. If you use std::vector::push_back, you are resizing the vector which will cause even worse performance. Next performance killer may be assign (Worst complexity: Linear of first argument), instead use operator [] (Complexity: constant).

On the other hand, bool [] is guaranteed to be array of bool.

And you should resize to n instead of n-1 to avoid undefined behaviour.

Mohit Jain
  • 29,414
  • 8
  • 65
  • 93
6

vector<bool> can be high performance, but isn't required to be. For vector<bool> to be efficient, it needs to operate on many bools at a time (e.g. isPrime.assign(n, true)), and the implementor has had to put loving care into it. Indexing individual bools in a vector<bool> is slow.

Here is a prime finder that I wrote a while back using vector<bool> and clang + libc++ (the libc++ part is important):

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>

std::vector<bool>
init_primes()
{
    std::vector<bool> primes(0x80000000, true);
    primes[0] = false;
    primes[1] = false;
    const auto pb = primes.begin();
    const auto pe = primes.end();
    const auto sz = primes.size();
    size_t i = 2;
    while (true)
    {
        size_t j = i*i;
        if (j >= sz)
            break;
        do
        {
            primes[j] = false;
            j += i;
        } while (j < sz);
        i = std::find(pb + (i+1), pe, true) - pb;
    }
    return primes;
}

int
main()
{
    using namespace std::chrono;
    using dsec = duration<double>;
    auto t0 = steady_clock::now();
    auto p = init_primes();
    auto t1 = steady_clock::now();
    std::cout << dsec(t1-t0).count() << "\n";
}

This executes for me in about 28s (-O3). When I change it to return a vector<char> instead, the execution time goes up to about 44s.

If you run this using some other std::lib, you probably won't see this trend. On libc++ algorithms such as std::find have been optimized to search a word of bits at a time, instead of bit at a time.

See http://howardhinnant.github.io/onvectorbool.html for more details on what std algorithms could be optimized by your vendor.

Howard Hinnant
  • 179,402
  • 46
  • 391
  • 527