remove copy struct member from std::vector

Question

I have a vector of struct member like below:

struct pnt
{
    bool has;
    int num;
};
std::vector<pnt> myvector;

let have a sample vector like:

myvector (num): 3 4 4 3 5 5 7 8 9 10 10                                                  
myvector (has): 1 1 0 1 0 1 0 0 0 1 0

What I want to do is to find duplicated members (in terms of having same int num) and remove the one with false bool member. so that my vector become like this:

myvector (num): 3 4 3 5 7 8 9 10                                                         
myvector (has): 1 1 1 1 0 0 0 1

to do so, I write following function:

void removeDuplicatedPnt(pnt_vec& myvector)
{
  std::vector<pnt>::iterator pnt_iter;
  for( pnt_iter = myvector.begin(); pnt_iter != myvector.end(); ++pnt_iter)
  {
      if(pnt_iter->has)
      {
          if(pnt_iter->num == (pnt_iter+1)->num)
          {
              myvector.erase(pnt_iter+1);
          }
          if(pnt_iter == myvector.begin())
          {
             continue;
          }
          if(pnt_iter->num == (pnt_iter-1)->num)
          {
              myvector.erase(pnt_iter-1);
              pnt_iter++;
          } 
       }
    }
}

I could also do it by sequential checking of members. but the real vector could be very long. so that is why first I went to find the member with true boolean then I checked the the next and previous member. The question is that how I can modify above code in terms of efficiency and robustness.

NOTE: I can only use C++03 (not C++11). I can also use boos (version 1.53), so feel free if think there is any useful function there. :)

http://stackoverflow.com/questions/tagged/erase-remove-idiom http://en.m.wikipedia.org/wiki/Erase-remove_idiom — n. 'pronouns' m., Dec 05 '14 at 14:10
`if(pnt_iter->num == (pnt_iter-1)->num)` This will fail if `pnt_iter` is the first item in the vector. — PaulMcKenzie, Dec 05 '14 at 14:15
Please be aware that using iterators for this might not be the best option, as they might all be [invalidated when the vector changes](http://stackoverflow.com/questions/6438086/iterator-invalidation-rules). In this particular case, using explicit indices might be a better option. — Martin Prazak, Dec 05 '14 at 14:52
Also, for last item, the expression `png_iter+1` refers to an element behind the end of your array (i.e., a possible segfault). — Martin Prazak, Dec 05 '14 at 14:59

score 2 · Accepted Answer · answered Dec 05 '14 at 14:57

You can use this algorithm:

Collect all nums where has is true in a set<int>
Go through your vector<pnt> again, and remove all entries where has is false and num is present in the set<int>

Here is a sample implementation:

struct filter {
    set<int> seen;
    bool operator()(const pnt& p) {
        return !p.has && (seen.find(p.num) != seen.end());
    }
};
...
filter f;
for (vector<pnt>::const_iterator i = v.begin() ; i != v.end() ; i++) {
    if (i->has) {
        f.seen.insert(i->num);
    }
}
v.erase(remove_if(v.begin(), v.end(), f), v.end());

Demo.

sehe · Answer 2 · 2014-12-05T15:24:54.263

1

You can use std::sort and std::unique with a custom comparison predicate:

[](const pnt& a, const pnt& b) { return a.num < b.num; }

Here's a convenient way to demo it using Boost Range to reduce on typing:

Update C++03 version Live On Coliru

#include <boost/range.hpp>
#include <boost/range/algorithm.hpp>
#include <iostream>

using namespace boost;

struct pnt {
    int num;
    bool has;

    pnt(int num = 0, bool has = false) : num(num), has(has) {}

    friend bool operator<(pnt const& a, pnt const& b) { return a.num<b.num; }
    friend bool operator==(pnt const& a, pnt const& b) { return a.num==b.num; }
};

int main() {
    std::vector<pnt> v { {10,0 },{10,1 },{9,0 },{8,0 },{7,0 },{5,1 },{5,0 },{3,1 },{4,0 },{4,1 },{3,1 } };

    for (pnt p : boost::unique(boost::sort(v)))
        std::cout << "{ num:" << p.num << ", has:" << p.has << "}\n";
}

This actually has a few subtler points that make it possible to e.g. do

it = std::find(v.begin(), v.end(), 3); // find a `pnt` with `num==3`

But that's only tangentially related

edited Dec 05 '14 at 15:24

answered Dec 05 '14 at 14:54

sehe

328,274
43
416
565

@dasblinkenlight I was fixing my answer to be c++03. Note that you too are sorting his vector. In a set :) (I know that this is different, but it's not very big) – sehe Dec 05 '14 at 15:06
By "sorting" I meant changing the order of items in the vector itself. I am using a `set` only for the `num` portion, and store it separately from the vector itself. I wish I could use unordered_set and lambdas, but it's all C++11 :-( – Sergey Kalinichenko Dec 05 '14 at 15:11
@dasblinkenlight I wager that using an `unordered_set` would just (a) increase noise and support code (`std::hash`...) (b) make it slower. For the case in the OP I don't anticipate anything will be faster than the vector, even if you need to sort on a copy. But I readily admit I'm too lazy to prove it :) – sehe Dec 05 '14 at 15:12
@dasblinkenlight/OP found the time to add the C++03 version **[Live On Coliru](http://coliru.stacked-crooked.com/a/552a305fd18ad1dc)** – sehe Dec 05 '14 at 15:25
@sehe thanks for you answer, but your answer returned correct vector but sorted. – H'H Dec 05 '14 at 15:30
1

@H'H I know. It's ok if that wasn't intended. There's the other answer that shows how to do this. You could potentially find a better performance using a hybrid approach, but I don't think it matters. If it does, I think you need to reconsider the data structure anyways :) – sehe Dec 05 '14 at 15:42
@sehe Thanks for your answer again then :) – H'H Dec 05 '14 at 15:43

remove copy struct member from std::vector

2 Answers2