20

An obvious (naive?) approach would be:

std::set<int> s;
for (int i = 0; i < SIZE; ++i) {
    s.insert(i);
}

That's reasonable readable, but from what I understand, not optimal since it involves repeatedly searching for the insertion position and does not take advantage of the fact that the input sequence is already sorted.

Is there a more elegant/efficient (or a de facto) way of initialising an std::set with a sequence of numbers?

Or, more generically, how does one efficiently insert an ordered list of entries into a collection?


Update:

Looking through the docs, I've just noticed the constructor that accepts an iterator to indicate the position for insertion:

iterator insert ( iterator position, const value_type& x );

Which means this would be more efficient:

std::set<int> s;
std::set<int>::iterator it = s.begin();
for (int i = 0; i < SIZE; ++i) {
    it = s.insert(it, i);
}

That looks reasonably, but I'm still open to more suggestions.

Shawn Chin
  • 74,316
  • 17
  • 152
  • 184
  • Mmm, you should benchmark this if you really want to find the most efficient way to do it. Since the input is in ascending order, I feel like you're adding elements in the wrong position(sets keep stuff in ascending order, so begin() would point to the lowest number). Could you benchmark it? I'm really interested :D – mfontanini Jun 13 '12 at 14:45
  • @mfontanini I'm initialising an empty set so `begin()` ought to do the job. I'll see if I can cook up a quick benchmark, but I'm pretty certain the second version would be faster. – Shawn Chin Jun 13 '12 at 14:48
  • @mfontanini Quick benchmark on ideone: [naive](http://ideone.com/eu70e) (0.49s) vs [with hint](http://ideone.com/f1bfm) (0.24s) for `1000000` entries. – Shawn Chin Jun 13 '12 at 14:57
  • You're right. Thanks for testing it. – mfontanini Jun 13 '12 at 15:08
  • possible duplicate of [Is the STL map container optimized (balanced tree) while constructed?](http://stackoverflow.com/questions/10428342/is-the-stl-map-container-optimized-balanced-tree-while-constructed) – Fred Foo Jul 03 '12 at 12:49

5 Answers5

24

The right iterator to use as the hint has changed between C++03 and C++11. With C++03, you want to use the position of the previous item (just as you and most of the replies have shown).

In C++11, you want to use the iterator to the item immediately after the one you're about to insert. When you're inserting in order, this makes things a bit simpler: you always use your_container.end():

std::set<int> s;
for (int i = 0; i < SIZE; ++i) 
    s.insert(s.end(), i);

You can, of course, use an algorithm (e.g., std::iota) or iterator (e.g., boost::counting_iterator, as @pmr already mentioned) to generate your values, but as far as the insertion itself goes, for a current implementation you want to use .end() as the hint, rather than the iterator returned by the previous insertion.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • Interesting. I thought the iterator for the hint could be one in the immediate neighborhood. – pmr Jun 13 '12 at 17:56
  • It can be anything close and still help, but the description has changed from: "iterator p is a hint pointing to where the insert should start to search." to: "t is inserted as close as possible to the position just prior to p." The complexity follows that. In C++03: "amortized constant if t is inserted right after p.", but in C++11: "amortized constant if t is inserted right before p." – Jerry Coffin Jun 13 '12 at 18:04
  • Strange decision. Do you have any hints where I can find a rationale for this change? I have a lot of APIs that follow the `inserted right after` scheme for hints, where the speed-up is essential. I also imagine it a lot more awkward to use than the code in my example. – pmr Jun 13 '12 at 20:35
  • 1
    According to the [defect report](http://groups.google.com/group/comp.std.c++/browse_frm/thread/6ff4ab58318f0870/52c2484d42d2a083), specifying it as "after" was originally an accident, and most implementations didn't work that way even when it was specified. – Jerry Coffin Jun 13 '12 at 21:07
  • 1
    I just tested pmr's code below, with the hint at .cbegin, .cend and your s.insert(s.end(),i); and your code is the fastest. Native is 83, .cend is 74.5, .cbegin is 66.5 and yours is 42 flat. Thanks. Well done! Set size was 51,200. – user2548100 Oct 23 '13 at 00:18
  • Does the same answer apply to `emplace` instead of `insert` as well? – Kagaratsch May 29 '21 at 14:05
  • @Kagaratsch: yes. – Jerry Coffin May 30 '21 at 03:50
14

The prettiest would be:

#include <set>
#include <boost/iterator/counting_iterator.hpp>

int main()
{
  const int SIZE = 100;
  std::set<int> s(boost::counting_iterator<int>(0), 
                  boost::counting_iterator<int>(SIZE));

  return 0;
}

If you aim for raw efficiency, using the hinted insert version could be helpful:

const int SIZE = 100;
std::set<int> s;
auto hint = s.begin();
for(int i = 0; i < SIZE; ++i)
  hint = s.insert(hint, i);

Being able to declaring hint along with the counter would be nice and give us a clean scope, but this requires struct hackery which I find a little obfuscating.

std::set<int> s;
for(struct {int i; std::set<int>::iterator hint;} 
      st = {0, s.begin()};
    st.i < SIZE; ++(st.i))
  st.hint = s.insert(st.hint, st.i);
ks1322
  • 29,461
  • 12
  • 91
  • 140
pmr
  • 54,366
  • 9
  • 104
  • 149
  • Thanks! I was hoping for something like that when I posted the Q. – Shawn Chin Jun 13 '12 at 14:46
  • You can declare multiple variables in a single for loop. No need for structs – rubenvb Jun 13 '12 at 14:50
  • @pmr I guess so. The struct thing looks ugly as hell though, no offense. – rubenvb Jun 13 '12 at 15:00
  • @rubenvb: It is impossible, look http://stackoverflow.com/q/2687392/72178, `struct` can be used as work-around. – ks1322 Jun 13 '12 at 15:08
  • 1
    Performance of the [counting_iterator version](http://ideone.com/2HF4a) seems to be not far off the [hinted insert version](http://ideone.com/f1bfm). As expected, they both beat the pants off the [naive version](http://ideone.com/eu70e). – Shawn Chin Jun 13 '12 at 15:13
  • `struct` hackery version is instructive, but I'll probably stay away from it for now. I'll just confuse myself when I review the code few months down the road. – Shawn Chin Jun 13 '12 at 15:14
  • @pmr Appreciate your answer, but ended up using [Jerry's solution](http://stackoverflow.com/questions/11017200/efficiently-initialise-stdset-with-a-sequence-of-numbers/11018223#11018223) hence moved green tick thingy. Sorry. On the plus size, this can potentially lead to a [Populist badge](http://stackoverflow.com/badges/62/populist) :) – Shawn Chin Jun 13 '12 at 15:38
  • The auto interator named hint is very pretty. Unfortunately, Jerry's code is faster, but still voting yours up. – user2548100 Oct 23 '13 at 00:24
  • Setting the `hint` to `s.begin()` in sorted input case would make its performance similar to normal insert method which doesn't use the position as argument right? – Mandeep Singh Jul 30 '19 at 09:00
4
#include <algorithm>
#include <set>
#include <iterator>

int main()
{
    std::set<int> s;
    int i = 0;
    std::generate_n(std::inserter(s, s.begin()), 10, [&i](){ return i++; });
}

This is (I think) equivalent to your second version, but IMHO looks much better.

C++03 version would be:

struct inc {
    static int i;
    explicit inc(int i_) { i = i_; }
    int operator()() { return i++; }
};

int inc::i = 0;

int main()
{
    std::set<int> s;
    std::generate_n(std::inserter(s, s.end()), SIZE, inc(0));
}
jrok
  • 51,107
  • 8
  • 99
  • 136
  • Thanks. It certainly looks more l33t, but from the point of view of a C++ n00b, it's a lot harder to follow. Still, +1 for teaching me something new. – Shawn Chin Jun 13 '12 at 15:02
  • I tried it and it's in fact slower than hinted version, but slightly faster than naive (at least on ideone). – jrok Jun 13 '12 at 15:03
  • Hmm... can't seem to get it working [on ideone](http://ideone.com/7TadH). What did I miss? – Shawn Chin Jun 13 '12 at 15:08
  • I'd prefer a version with a `static` int member. – pmr Jun 13 '12 at 15:09
3

Well you can use the insert() version of set<> in which you can provide the position as hint where the element might get inserted.

iterator insert ( iterator position, const value_type& x );

Complexity: This version is logarithmic in general, but amortized constant if x is inserted right after the element pointed by position.

Shawn Chin
  • 74,316
  • 17
  • 152
  • 184
pravs
  • 1,061
  • 8
  • 8
  • 1
    Thanks pravs. I did come across this moments after I posted the question, and have updated the post to reflect that. Still, +1. – Shawn Chin Jun 13 '12 at 14:51
1

This can be accomplished in a single line of code. The lambda capture can initialize the variable i to 0 and the mutable specifier allows i to be updated within the lambda function:

generate_n( inserter( s, s.begin() ), SIZE, [ i=0 ]() mutable { return i++; });
claytonjwong
  • 647
  • 1
  • 5
  • 12