13

I have a big vector of items that belong to a certain class.

struct item {
    int class_id;
    //some other data...
};

The same class_id can appear multiple times in the vector, and the vector is constructed once and then sorted by class_id. So all elements of the same class are next to each other in the vector.

I later have to process the items per class, ie. I update all items of the same class but I do not modify any item of a different class. Since I have to do this for all items and the code is trivially parallelizable I wanted to use Microsoft PPL with Concurrency::parallel_for_each(). Therefore I needed an iterator and came up with a forward iterator that returns the range of all items with a certain class_id as proxy object. The proxy is simply a std::pair and the proxy is the iterator's value type.

using item_iterator = std::vector<item>::iterator;
using class_range = std::pair<item_iterator, item_iterator>;

//iterator definition
class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range> { /* ... */ };

By now I was able to loop over all my classes and update the items like this.

std::vector<item> items;
//per_class_* returns a per_class_iterator
std::for_each(items.per_class_begin(), items.per_class_end(),
[](class_range r) 
{ 
    //do something for all items in r 
    std::for_each(r.first, r.second, /* some work */);
});

When replacing std::for_each with Concurrency::parallel_for_each the code crashed. After debugging I found the problem to be the following code in _Parallel_for_each_helper in ppl.h at line 2772 ff.

// Add a batch of work items to this functor's array
for (unsigned int _Index=0; (_Index < _Size) && (_First != _Last); _Index++)
{
    _M_element[_M_len++] = &(*_First++);
}

It uses postincrement (so a temporary iterator is returned), dereferences that temporary iterator and takes the address of the dereferenced item. This only works if the item returned by dereferencing a temporary object survives, ie. basically if it points directly into the container. So fixing this is easy, albeit the per class std::for_each work loop has to be replaced with a for-loop.

//it := iterator somewhere into the vector of items (item_iterator)
for(const auto cur_class = it->class_id; cur_class == it->class_id; ++it)
{
    /* some work */
}

My question is if returning proxy objects the way I did is violating the standard or if the assumption that every iterator dereferences into permanent data has been made by Microsoft for their library, but is not documented. At least I could not find any documentation on the iterator requirements for parallel_for_each() except that either a random access or a forward iterator are expected. I have seen the question about forward iterators and vector but since my iterator's reference type is const value_type& I still think my iterator is ok by the standard. So is a forward iterator returning a proxy object still a valid forward iterator? Or put another way, is it ok for an iterator to have a value type different from a type that is actually stored somewhere in a container?

Compilable example:

#include <vector>
#include <utility>
#include <cassert>
#include <iterator>
#include <memory>
#include <algorithm>
#include <iostream>

#include <ppl.h>


using identifier = int;
struct item
{
    identifier class_id;
    // other data members
    // ...

    bool operator<(const item &rhs) const
    {
        return class_id < rhs.class_id;
    }

    bool operator==(const item &rhs) const
    {
        return class_id == rhs.class_id;
    }

    //inverse operators omitted
};
using container = std::vector<item>;
using item_iterator = typename container::iterator;
using class_range = std::pair<item_iterator, item_iterator>;

class per_class_iterator : public std::iterator<std::forward_iterator_tag, class_range>
{
public:
    per_class_iterator() = default;
    per_class_iterator(const per_class_iterator&) = default;
    per_class_iterator& operator=(const per_class_iterator&) = default;

    explicit per_class_iterator(container &data) :
        data_(std::addressof(data)),
        class_(equal_range(data_->front())), //this would crash for an empty container. assume it's not.
        next_(class_.second)
    {
        assert(!data_->empty()); //a little late here
        assert(std::is_sorted(std::cbegin(*data_), std::cend(*data_)));
    }

    reference operator*()
    {
        //if data_ is unset the iterator is an end iterator. dereferencing end iterators is bad.
        assert(data_ != nullptr);
        return class_;
    }

    per_class_iterator& operator++()
    {
        assert(data_ != nullptr);

        //if we are at the end of our data
        if(next_ == data_->end())
        {
            //reset the data pointer, ie. make iterator an end iterator
            data_ = nullptr;
        }
        else
        {
            //set to the class of the next element
            class_ = equal_range(*next_);
            //and update the next_ iterator
            next_ = class_.second;
        }

        return *this;
    }

    per_class_iterator operator++(int)
    {
        per_class_iterator tmp{*this};
        ++(*this);
        return tmp;
    }

    bool operator!=(const per_class_iterator &rhs) const noexcept
    {
        return (data_ != rhs.data_) ||
            (data_ != nullptr && rhs.data_ != nullptr && next_ != rhs.next_);
    }

    bool operator==(const per_class_iterator &rhs) const noexcept
    {
        return !(*this != rhs);
    }

private:
    class_range equal_range(const item &i) const
    {
        return std::equal_range(data_->begin(), data_->end(), i);
    }

    container* data_ = nullptr;
    class_range class_;
    item_iterator next_;
};

per_class_iterator per_class_begin(container &c)
{
    return per_class_iterator{c};
}

per_class_iterator per_class_end()
{
    return per_class_iterator{};
}

int main()
{
    std::vector<item> items;
    items.push_back({1});
    items.push_back({1});
    items.push_back({3});
    items.push_back({3});
    items.push_back({3});
    items.push_back({5});
    //items are already sorted

//#define USE_PPL
#ifdef USE_PPL
    Concurrency::parallel_for_each(per_class_begin(items), per_class_end(),
#else
    std::for_each(per_class_begin(items), per_class_end(),
#endif
        [](class_range r)
        {
            //this loop *cannot* be parallelized trivially
            std::for_each(r.first, r.second,
                [](item &i)
                {
                    //update item (by evaluating all other items of the same class) ...
                    //building big temporary data structure for all items of same class ...
                    //i.processed = true;
                    std::cout << "item: " << i.class_id << '\n';
                });
        });

    return 0;
}
user2460318
  • 165
  • 9
  • What is `items`? Is it a `std::vector`? – wally May 12 '16 at 16:20
  • 1
    First look I'd think that an iterator returning a proxy would be ok and was mildly upset that `parallel_for_each` would use direct pointers. But cppreference.com lists this as a condition for forward iterator "If a and b compare equal (a == b is contextually convertible to true) then either they are both non-dereferenceable or *a and *b are references bound to the same object" which would be a problem because the two proxies (even if they look and act the same) would not be the same object. Though I don't know how strictly this is held up by the standard's wording (haven't looked). – kmdreko May 12 '16 at 16:20
  • Why not `using per_class_iterator = std::vector::iterator` instead of `class per_class_iterator : public std::iterator { /* ... */ };`? – wally May 12 '16 at 16:25
  • Or how about a `std::multimap` with class_id as the key? – wally May 12 '16 at 16:28
  • @flatmouse: Sorry, items is std::vector. I've edited my post. (That also explains your second question, why per_class_iterator is not a typedef to a std::vector::iterator, because that vector does not exist.) And multimap is not used because I know I want a std::vector, since I can preallocate the exact number of elements and do not need any tree structures. I've profiled the code and a sorted std::vector is the fastest and most space efficient version I tested. Even an unordered_multimap allocates unused buckets which is not acceptable in my case. – user2460318 May 12 '16 at 16:45
  • If `items` contains `item` elements, then how can the lambda `[](class_range r)` accept `class_range` items? – wally May 12 '16 at 16:47
  • @flatmouse: You are absolutely right. I fixed my code again. I guess my brain compiler is not working, so I should have tested my example code in a real compiler. Sorry. – user2460318 May 12 '16 at 16:54
  • Would it help to do a loop of `std::equal_range` calls and use those iterators for your parallel call? – Galik May 12 '16 at 17:05
  • From what is posted this should work, but if it is not working then there must be something else. We don't know what is happening in ` { /* ... */ }` for the class iterator and it might help to see what is done in `per_class_*`. I'm still curious to know why `per_class_iterator` is declared the way it is. Please post an [mcve](stackoverflow.com/help/mcve). – wally May 12 '16 at 17:10
  • @vu1p3n0x: I think you are right. Especially the third point (about incrementing `It`) of the [Multipass guarantee](http://en.cppreference.com/w/cpp/concept/ForwardIterator) (where your quote is taken from) says "either `It` is a raw pointer type or [...]" which a proxy is clearly not. I have to check if a proxy iterator can fulfill the other condition where `(void)++It(a), *a` should be equivalent to `*a`. – user2460318 May 12 '16 at 17:10
  • @Galik That's basically what the iterator does. It calls `std::equal_range` and stores the returned result. Incrementing calls `equal_range` for the next `class_id` (read from the end iterator of the previous call to equal_range). I will provide an actually compilable example as flatmouse suggested tomorrow, since I do not have access to MSVC now. – user2460318 May 12 '16 at 17:22
  • @user2460318 But your iterator is using a proxy. I thought making an equal_range loop directly would give you direct iterators and shouldn't really be much more cumbersome that the loop you have. – Galik May 12 '16 at 17:32
  • 3
    There have always been problems with proxy iterators - perhaps the newer +14 or +17 versions of the language enable them with no (or fewer) problems. But, for example, see this article: [To Be or Not to Be (an Iterator), by Eric Niebler](http://ericniebler.com/2015/01/28/to-be-or-not-to-be-an-iterator/) for an explanation of the issues - which includes a discussion of why the "requirements for an iterator" are not exactly what you think they are (for forward, random, etc.). – davidbak May 12 '16 at 19:51
  • @flatmouse "Minimal" example added. It's still just below 150 lines. – user2460318 May 13 '16 at 07:20
  • @Galik I had loops using equal_range directly, but they cannot be parallelized using PPL. `parallel_for_each` requires iterators and `parallel_for` requires a fixed increment. That's why I tried hiding the equal_range call behind an iterator in the first place. – user2460318 May 13 '16 at 07:23
  • @user2460318 But `std::equal_range` returns 2 iterators. Isn't that what you need for `parallel_for_each`? – Galik May 13 '16 at 07:26
  • @vu1p3n0x I think I have tested all requirements of the multipass guarantee given at cppreference and I think my iterator should be a valid forward iterator as long as "same object" means equality comparable. But I have to assume equality comparable is not enough to be the same object. – user2460318 May 13 '16 at 07:27
  • @Galik `std::equal_range` returns two iterators describing the range of same class items. I cannot run in parallel over items of the same class. I can only process different classes in parallel. (See my compilable example or my comment to DarkWanderers answer.) – user2460318 May 13 '16 at 09:43

2 Answers2

6

When you're writing a proxy iterator, the reference type should be a class type, precisely because it can outlive the iterator it is derived from. So, for a proxy iterator, when instantiating the std::iterator base should specify the Reference template parameter as a class type, typically the same as the value type:

class per_class_iterator : public std::iterator<
    std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range*, class_range>
                                                                          ~~~~~~~~~~~

Unfortunately, PPL is not keen on proxy iterators and will break compilation:

ppl.h(2775): error C2338: lvalue required for forward iterator operator *
ppl.h(2772): note: while compiling class template member function 'Concurrency::_Parallel_for_each_helper<_Forward_iterator,_Function,1024>::_Parallel_for_each_helper(_Forward_iterator &,const _Forward_iterator &,const _Function &)'
        with
        [
            _Forward_iterator=per_class_iterator,
            _Function=main::<lambda_051d98a8248e9970abb917607d5bafc6>
        ]

This is actually a static_assert:

    static_assert(std::is_lvalue_reference<decltype(*_First)>::value, "lvalue required for forward iterator operator *");

This is because the enclosing class _Parallel_for_each_helper stores an array of pointers and expects to be able to indirect them later:

typename std::iterator_traits<_Forward_iterator>::pointer    _M_element[_Size];

Since PPL doesn't check that pointer is actually a pointer, we can exploit this by supplying a proxy pointer with an operator* and overloading class_range::operator&:

struct class_range_ptr;
struct class_range : std::pair<item_iterator, item_iterator> {
    using std::pair<item_iterator, item_iterator>::pair;
    class_range_ptr operator&();
};
struct class_range_ptr {
    class_range range;
    class_range& operator*() { return range; }
    class_range const& operator*() const { return range; }
};
inline class_range_ptr class_range::operator&() { return{*this}; }

class per_class_iterator : public std::iterator<
    std::forward_iterator_tag, class_range, std::ptrdiff_t, class_range_ptr, class_range&>
{
    // ...

This works great:

item: item: 5
1
item: 3item: 1

item: 3
item: 3
Press any key to continue . . .
ecatmur
  • 137,771
  • 23
  • 263
  • 343
  • Sorry for the long delay when replying. I had a long weekend. Though your solution clearly violates the forward iterator requirement (`iterator::reference != T&`) it is a perfectly working solution that allows to fool PPL and accept the iterator. Thanks for this suggestion, I really did not think about overloading `operator&()`. – user2460318 May 17 '16 at 06:24
  • @ecatmur: Thank you very much for this answer. After reading it everything clicked for my problem. It works perfectly. I still think ppl is doing "the wrong thing" though by assuming that iterator values must must point to somehow externally allocated data. – starmole Dec 08 '16 at 04:28
0

For your direct question, no, iterator does not have to be something which is related to any kind of container. About only requirements for an iterator are for it to be:

  • be copy-constructible, copy-assignable and destructible
  • support equality/inequality
  • be dereferencable

Iterator does not necessarily has to be tied to a particular container (see generators), and so it cannot be said that "it has to has same type as container" - because there is no container in generic case.

It seems, hovever, having a custom iterator class may be actually an overkill in your case. Here's why:

In C++, array/vector end iterator is and iterator pointing just behind the end of the last item.

Given a vector of objects of "classes" (in your definition) A,B,C, etc., filled like following:

AAAAAAABBBBBBBBBBBBCCCCCCCD.......

You can just take regular vector iterators that will act as your range starts and ends:

AAAAAAABBBBBBBBBBBBCCCCCCCD......Z
^      ^           ^      ^       ^
i1     i2          i3     i4      iN

For the 4 iterators you see here, following is true:

  1. i1 is begin iterator for class A
  2. i2 is end iterator for class A and begin iterator for class B
  3. i3 is end iterator for class B and begin iterator for class C etc.

Hence, for each class you can have a pair of iterators which are start and end of the respective class range.

Hence, your processing is as trivial as:

for(auto it = i1; i!= i2; i++) processA(*it);
for(auto it = i2; i!= i3; i++) processB(*it);
for(auto it = i3; i!= i4; i++) processC(*it);

Each loop being trivially parallelizable.

parallel_for_each (i1; i2; processA);
parallel_for_each (i2; i3; processB);
parallel_for_each (i3; i4; processC);

To use a range-based for, you can introduce a substitute range class:

class vector_range<T> {
    public:
        vector<T>::const_iterator begin() {return _begin;};
        vector<T>::const_iterator end() {return _end;};
    // Trivial constructor filling _begin and _end fields
}

That is to say, you don't really need a proxy iterators to parallelize loops - the way C++ iterators are done is already ideally covers your case.

DarkWanderer
  • 8,544
  • 1
  • 23
  • 53
  • This is true, but the opposite of what I actually want. I want to process items of _different classes in parallel_, whereas items of the _same class_ have to be _processed in one thread_. I have added a compilable example that shows what I try to achieve. By reading the comments and links provided in these comments I get the impression that a forward iterator currently has to return a real reference. So it might be that my iterator is violating the ForwardIterator contract. – user2460318 May 13 '16 at 09:21