2

I profiled my code using gprof and from the report, most, if not all of the top 20 or so things are about vector

Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 14.71      0.05     0.05  3870399     0.00     0.00  std::vector<bool, std::allocator<bool> >::size() const
 11.76      0.09     0.04 10552897     0.00     0.00  std::_Bit_reference::_Bit_reference(unsigned long*, unsigned long)
 11.76      0.13     0.04  7890323     0.00     0.00  std::_Bit_const_iterator::_Bit_const_iterator(std::_Bit_iterator const&)
  5.88      0.15     0.02 10089215     0.00     0.00  std::_Bit_iterator::operator*() const
  5.88      0.17     0.02  6083600     0.00     0.00  std::vector<bool, std::allocator<bool> >::operator[](unsigned int)
  5.88      0.19     0.02  3912611     0.00     0.00  std::vector<bool, std::allocator<bool> >::end() const
  5.88      0.21     0.02                             std::istreambuf_iterator<char, std::char_traits<char> > std::num_get<char, std::istreambuf_iterator<char, std::char_traits<char> > >::_M_extract_int<unsigned long long>(std::istreambuf_iterator<char, std::char_traits<char> >, std::istreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, std::_Ios_Iostate&, unsigned long long&) const
  2.94      0.22     0.01  6523499     0.00     0.00  std::_Bit_reference::operator bool() const
  2.94      0.23     0.01  3940406     0.00     0.00  std::vector<bool, std::allocator<bool> >::begin() const
  2.94      0.24     0.01  2807828     0.00     0.00  std::_Bit_iterator::operator++()
  2.94      0.25     0.01   146917     0.00     0.00  std::_Bit_iterator_base::_M_incr(int)
  2.94      0.26     0.01   121706     0.00     0.00  std::__miter_base<unsigned long*, false>::__b(unsigned long*)
  2.94      0.27     0.01    46008     0.00     0.00  std::_Bvector_base<std::allocator<bool> >::~_Bvector_base()
  2.94      0.28     0.01    22596     0.00     0.00  std::_Bit_iterator std::__copy_move<false, false, std::random_access_iterator_tag>::__copy_m<std::_Bit_iterator, std::_Bit_iterator>(std::_Bit_iterator, std::_Bit_iterator, std::_Bit_iterator)
  2.94      0.29     0.01     4525     0.00     0.05  integer::operator+(integer)
  2.94      0.30     0.01     1382     0.01     0.01  void std::_Destroy<unsigned int*, unsigned int>(unsigned int*, unsigned int*, std::allocator<unsigned int>&)
  2.94      0.31     0.01                             std::string::size() const
  2.94      0.32     0.01                             std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()
  2.94      0.33     0.01                             std::locale::locale()
  2.94      0.34     0.01                             __dynamic_cast

is that a good sign, since it means that the rest of my functions are pretty efficeint, or that accessing values from a vector< bool > is really slow?

im compiling with gcc -std=c++0x

calccrypto
  • 7,651
  • 19
  • 63
  • 96
  • 1
    How about adding the column headings so that we don't have to guess at what we are reading? Next, tell us the compiler options that you are using. – Ed S. Jun 23 '11 at 22:44
  • 1
    Something strange here. Why vector::size() takes so much time per call? – Zaur Nasibov Jun 23 '11 at 22:46
  • mandatory change: `g++ --std=c++0x -g -O3` (that's **two** typo's and the optimization flag, reprofile!); Template classes use inlining heavily and this in turn enables a gazillion other optimizations. The order of speedup is easily 10-fold – sehe Jun 23 '11 at 22:54
  • im not a terminal person. im using codeblocks. sorry – calccrypto Jun 23 '11 at 22:56
  • @calccrypto: what are you saying? Do you, or do you not have optimizations enabled? – sehe Jun 23 '11 at 22:57
  • yes. i enabled them in the codeblocks gui, not by typing them myself. i dont actually know the syntax – calccrypto Jun 23 '11 at 22:58
  • possible duplicate of [Why vector::reference doesn't return reference to bool?](http://stackoverflow.com/questions/8399417/why-vectorboolreference-doesnt-return-reference-to-bool) – BЈовић Feb 12 '13 at 08:42

5 Answers5

8

vector<bool> does not store bools. It's basically a bitfield. You're paying for the bit twiddling it takes to modify a single value.

If runtime performance is a concern, consider vector<char> or deque<bool> instead.

Billy ONeal
  • 97,781
  • 45
  • 291
  • 525
  • `vector` should never have made it into the standard in its current form. Unfortunately we're stuck with it. – Mark Ransom Jun 23 '11 at 22:50
  • @Mark Ransom: [The C++ committee agrees with you](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html). After all, [`vector` isn't even a container](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2130.html#96). – In silico Jun 23 '11 at 22:54
  • @Insilico : Well, they must only agree to a limited extent since `vector` is still in the FDIS. ;-] – ildjarn Jun 23 '11 at 23:13
  • @ildjarn: The paper In silico linked to was published in 2007. The committee stopped accepting papers for C++0x in (I think) 2006. – Billy ONeal Jun 23 '11 at 23:20
  • Ah, interesting. Well, let's hope it makes it into TR2. – ildjarn Jun 23 '11 at 23:21
2

since it means that the rest of my functions are pretty efficeint, or that accessing values from a vector is really slow?

As 'slow' and 'efficient' are relative values, this is essentially a senseless distinction. The most objective way to interpret the report is:

As std::vector' operation eat up the most significant amount of time, this should be the point to start with to make the code even faster.

Note that std::vector<bool> is generally a bit slower than a std::vector<int> because it does not store real bools but rather a set of bitmasks (i.e. ideally it needs only one bit per entry). This saves space, but is slower. If you need it to be faster, try using std::vector<int> (or char, .., depending on your needs) instead.

I would suspect that std::vector<bool> may suffer greatly from debug builds, so try some optimization flags if you didn't do that already (you always should for profiling).

Alexander Gessler
  • 42,787
  • 5
  • 78
  • 120
1

vector<bool> is actually a template specialization where each bool value is stored as a single bit. However, it's not possible to directly work with individual bits the same way you could with int or with just "normal" bool. So the algorithms used in vector<bool> is very different from a "normal" vector<>, and in order to perserve the vector interface as much as possible it may return proxy objects that manipulates the bits when you call functions like operator[]. That may contribute to the results in your gprof report, depending on how your compiler is configured and the code in question.

In silico
  • 47,853
  • 7
  • 140
  • 135
1

I'd say it stinks because 14.71% of your time is spent doing vector<bool>::size() !?! Size is probably a given.

Try to reduce the number of calls to size() or use a fixed size vector if you know the size up front: bitset

Edit after reading the update to the question:

Mandatory change: g++ --std=c++0x -g -O3 (that's two typo's and the optimization flag, reprofile!); Template classes use inlining heavily and this in turn enables a gazillion other optimizations. The order of speedup is easily 10-fold

sehe
  • 328,274
  • 43
  • 416
  • 565
1

Does it say much about your program? Other than the vector<bool> business, it's telling you basically nothing.

You're seeing first hand the problems with gprof.

Suppose you know some function has high "self time", meaning the program counter was sampled a good number of times in it, but it's not a function you wrote or can modify.

The only thing you can do about it is try to call it less, or try to call less the routine that calls it, or try to call that routine less, and you're left trying to guess where that is.

gprof tries to help you by also guessing what a routine's inclusive time is, how many times it is called, and a call graph. If there's no recursion, and you've only got a dozen or so functions, and you're not doing any I/O, this may be helpful.

There's a slightly different approach, embodied in profilers like Zoom. Instead of sampling just the program counter, sample the whole call stack. Why? Because the lines of code responsible for the time being spent are on the stack during that time, just asking to be noticed.

Profilers that sample the call stack, on wall clock time, and tell you which lines of code are found on the stack most of the time, are the most effective. Even more effective is if you can look over individual samples of the stack, because that also tells you why those lines are being invoked, not just how much, so it's easy to tell if you don't really need them.

Community
  • 1
  • 1
Mike Dunlavey
  • 38,662
  • 12
  • 86
  • 126