29

I find the C++ STL method of doing simple set operations quite clunky to use. For example, to find the difference between two sets:

std::set<int> newUserIds;
set_difference(currentUserIds.begin(), currentUserIds.end(), mPreviousUserIds.begin(), mPreviousUserIds.end(), std::inserter(newUserIds, newUserIds.end()));
std::set<int> missingUserIds;
set_difference(mPreviousUserIds.begin(), mPreviousUserIds.end(), currentUserIds.begin(), currentUserIds.end(), std::inserter(missingUserIds, missingUserIds.end()));
mPreviousUserIds = currentUserIds;

Does boost offer an alternative set of classes that would reduce the above example to something like this:

set_type<int> newUserIds = currentUserIds.difference(mPreviousUserIds);
set_type<int> missingUserIds = mPreviousUserIds.difference(currentUserIds);

(Similar to QSet in Qt which overrides operator- in this way.)

Community
  • 1
  • 1
Tim MB
  • 3,975
  • 2
  • 32
  • 47
  • 5
    It's a five-finger exercise to write that if you want it. – Pete Becker Feb 26 '13 at 14:02
  • 2
    It is but it leaves me with a personal base of code to import into any project where I need it, which isn't always possible (e.g. at work) and makes it harder for others to understand the code. – Tim MB Feb 26 '13 at 14:41

3 Answers3

89

Nope. But I here is how to clean it up.

First, rewrite iterator based functions as ranged based functions. This halves your boilerplate.

Second, have them return container builders rather than take insert iterators: this gives you efficient assignment syntax.

Third, and probably too far, write them as named operators.

The final result is you get:

set<int> s = a *intersect* b;
set<int> s2 = c -difference- s;
set<int> s3 = a *_union_* (b *intersect* s -difference- s2);

... after writing a boatload of boilerplate code elsewhere.

As far as I know, boost does step 1.

But each of the above three stages should reduce your boilerplate significantly.

Container builder:

template<typename Functor>
struct container_builder {
  Functor f;
  template<typename Container, typename=typename std::enable_if<back_insertable<Container>::value>::type>
  operator Container() const {
    Container retval;
    using std::back_inserter;
    f( back_inserter(retval) );
    return retval;
  }
  container_builder(Functor const& f_):f(f_) {}
};

which requires writing is_back_insertable (pretty standard SFINAE).

You wrap your ranged based (or iterator based) functor that takes a back_insert_iterator as the last argument, and use std::bind to bind the input parameters leaving the last one free. Then pass that to container_builder, and return it.

container_builder can then be implicitly cast to any container that accepts std::back_inserter (or has its own ADL back_inserter), and move semantics on every std container makes the construct-then-return quite efficient.

Here is my dozen line named operator library:

namespace named_operator {
  template<class D>struct make_operator{make_operator(){}};

  template<class T, char, class O> struct half_apply { T&& lhs; };

  template<class Lhs, class Op>
  half_apply<Lhs, '*', Op> operator*( Lhs&& lhs, make_operator<Op> ) {
    return {std::forward<Lhs>(lhs)};
  }

  template<class Lhs, class Op, class Rhs>
  auto operator*( half_apply<Lhs, '*', Op>&& lhs, Rhs&& rhs )
  -> decltype( named_invoke( std::forward<Lhs>(lhs.lhs), Op{}, std::forward<Rhs>(rhs) ) )
  {
    return named_invoke( std::forward<Lhs>(lhs.lhs), Op{}, std::forward<Rhs>(rhs) );
  }
}

live example using it to implement vector *concat* vector. It only supports one operator, but extending it is easy. For serious use, I'd advise having a times function that by default calls invoke for *blah*, an add for +blah+ that does the same, etc. <blah> can directly call invoke.

Then the client programmer can overload an operator-specific overload and it works, or the general invoke.

Here is a similar library being used to implement *then* on both tuple-returning functions and futures.

Here is a primitive *in*:

namespace my_op {
  struct in_t:named_operator::make_operator<in_t>{};
  in_t in;

  template<class E, class C>
  bool named_invoke( E const& e, in_t, C const& container ) {
    using std::begin; using std::end;
    return std::find( begin(container), end(container), e ) != end(container);
  }
}
using my_op::in;

live example.

Community
  • 1
  • 1
Yakk - Adam Nevraumont
  • 235,777
  • 25
  • 285
  • 465
  • 31
    Whoa. Named operators. Mind. blown. Why have I never thought of this before? – Konrad Rudolph Feb 26 '13 at 14:08
  • 1
    What the...? Now that is an interresting thing (even if more from a theoretical point of view, I guess, but who knows). – Christian Rau Feb 26 '13 at 14:25
  • 2
    @KonradRudolph Here, have an [implementation](http://liveworkspace.org/code/12lxjX$40). `make_infix(arbitrary_binary_functor)` returns a named operator on `*`. (everything below `make_infix` is various kinds of test code) – Yakk - Adam Nevraumont Feb 26 '13 at 14:26
  • 1
    @Yakk The implementation was easy once you named the concept. … although I think I’d use `` or `%op%` (the latter because it’s the syntax R uses). – Konrad Rudolph Feb 26 '13 at 15:01
  • 1
    Well... thank you for this. Not quite what I had in mind (!) but an interesting insight into what's possible in deep C++. – Tim MB Feb 26 '13 at 15:01
  • 5
    @KonradRudolph the precedence of a named operator in C++ is the same as the "surrounding" operators. `%op%` thus ties with other multiplication/division operators in precedence, and `` is looser than anything except assignment, logical, and `==` type operations. `v1 *dot* v2 + v3` follows expected precedence rules, as does `v0 *dot* v1 v2 *dot* v3`. Given that any binary operator works, and it has meaning, I simply let the user choose -- so `make_infix'>(func)` gives you your syntax. A nice side effect is `vec1 +append= vec2;` syntax reads real pretty as well. – Yakk - Adam Nevraumont Feb 26 '13 at 20:17
  • 1
    @Yakk Yes, I noticed that. See the (very brief) discussion in [my question on Code Review](http://codereview.stackexchange.com/q/23179/308). – Konrad Rudolph Feb 26 '13 at 22:47
  • Can you make an isolated example (one file only) that implements an `in` operator? – noɥʇʎԀʎzɐɹƆ Jan 20 '17 at 00:47
  • 1
    @noɥʇʎԀʎzɐɹƆ added to answer – Yakk - Adam Nevraumont Jan 20 '17 at 01:18
  • You need to forward-declare `named_invoke`. – noɥʇʎԀʎzɐɹƆ Jan 20 '17 at 01:29
  • 1
    @noɥʇʎԀʎzɐɹƆ I provided code that compiles on both gcc and clang that does not in the live example link. Why do you think forward declaration is required? ADL covers a multitude of sins. – Yakk - Adam Nevraumont Jan 20 '17 at 01:50
  • 2
    @noɥʇʎԀʎzɐɹƆ Get a better static code analyzer. Or just write `templatevoid named_invoke(Ts&&...)=delete;` in the named operator namespace before use, which could shut it up. Might not; is really working around a bug in the analyzer, so who knows what I need to do. – Yakk - Adam Nevraumont Jan 20 '17 at 01:51
  • 1
    @noɥʇʎԀʎzɐɹƆ It just does whatever the code you put into the `named_invoke` does. In this case, I wrote a linear find. You can write overloads, fancy SFINAE based code, or whatever in it. You can make the container be a binary tree, a linear array, a sorted array, a hash table, or a myriad of other options. The performance is ... whatever you write. – Yakk - Adam Nevraumont Jan 20 '17 at 03:50
  • 1
    @noɥʇʎԀʎzɐɹƆ There is no reason for there to be any overhead after inlining over a "direct" call to `named_apply`, except maybe blocking an elision and replacing it with a `move` if your `named_apply` takes by value. Everything else is expression templates, fancy, but there isn't a reason for them to actually exist at runtime. – Yakk - Adam Nevraumont Jan 20 '17 at 18:18
  • @Yakk But statistics speak louder – noɥʇʎԀʎzɐɹƆ Jan 20 '17 at 18:20
  • 2
    @noɥʇʎԀʎzɐɹƆ [Look 10^18 iterations in zero time](http://coliru.stacked-crooked.com/a/3bc042c3d3b5ed00). If you want help profiling something that compiles away to nothing, ask a SO question about it. There is nothing to profile, the named operator code doesn't exist in the output of an optimizing compiler. I'm sure there are compilers that fail to optimize it or cases where it cannot be done, but that is a quality of implementation issue in that particular compiler. – Yakk - Adam Nevraumont Jan 20 '17 at 18:27
  • OK, but noop doesn't really count, does it? – noɥʇʎԀʎzɐɹƆ Jan 20 '17 at 18:30
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/133669/discussion-between-nohtyyzar-and-yakk). – noɥʇʎԀʎzɐɹƆ Jan 20 '17 at 18:48
12

See Boost Range Set algorithms. They still expect an output iterator though.

Maxim Egorushkin
  • 119,842
  • 14
  • 147
  • 239
3

No and I think it never have something like that, this is a general principle in C++ that when you can have a non-member function to do the job never make that function a member. so it can't be like that, but may be Boost::Range help you.

BigBoss
  • 6,749
  • 1
  • 17
  • 38
  • 3
    I think OP’s beef wasn’t with the nonmember functions – the member function implementation was just an example. And there are trivial (and not so trivial) ways of making OP’s code way more concise with a proper library without having to use member function (Boost.Range does offer such a way, I believe). – Konrad Rudolph Feb 26 '13 at 13:38