4

I have a vector of struct member like below:

struct pnt
{
    bool has;
    int num;
};
std::vector<pnt> myvector;

let have a sample vector like:

myvector (num): 3 4 4 3 5 5 7 8 9 10 10                                                  
myvector (has): 1 1 0 1 0 1 0 0 0 1 0 

What I want to do is to find duplicated members (in terms of having same int num) and remove the one with false bool member. so that my vector become like this:

myvector (num): 3 4 3 5 7 8 9 10                                                         
myvector (has): 1 1 1 1 0 0 0 1 

to do so, I write following function:

void removeDuplicatedPnt(pnt_vec& myvector)
{
  std::vector<pnt>::iterator pnt_iter;
  for( pnt_iter = myvector.begin(); pnt_iter != myvector.end(); ++pnt_iter)
  {
      if(pnt_iter->has)
      {
          if(pnt_iter->num == (pnt_iter+1)->num)
          {
              myvector.erase(pnt_iter+1);
          }
          if(pnt_iter == myvector.begin())
          {
             continue;
          }
          if(pnt_iter->num == (pnt_iter-1)->num)
          {
              myvector.erase(pnt_iter-1);
              pnt_iter++;
          } 
       }
    }
}

I could also do it by sequential checking of members. but the real vector could be very long. so that is why first I went to find the member with true boolean then I checked the the next and previous member. The question is that how I can modify above code in terms of efficiency and robustness.

NOTE: I can only use C++03 (not C++11). I can also use boos (version 1.53), so feel free if think there is any useful function there. :)

H'H
  • 1,539
  • 1
  • 12
  • 32
  • http://stackoverflow.com/questions/tagged/erase-remove-idiom http://en.m.wikipedia.org/wiki/Erase-remove_idiom – n. 'pronouns' m. Dec 05 '14 at 14:10
  • `if(pnt_iter->num == (pnt_iter-1)->num)` This will fail if `pnt_iter` is the first item in the vector. – PaulMcKenzie Dec 05 '14 at 14:15
  • @dasblinkenlight sorry I edited the question – H'H Dec 05 '14 at 14:18
  • @PaulMcKenzie thanks for your notice, I edited the question – H'H Dec 05 '14 at 14:21
  • Please be aware that using iterators for this might not be the best option, as they might all be [invalidated when the vector changes](http://stackoverflow.com/questions/6438086/iterator-invalidation-rules). In this particular case, using explicit indices might be a better option. – Martin Prazak Dec 05 '14 at 14:52
  • Also, for last item, the expression `png_iter+1` refers to an element behind the end of your array (i.e., a possible segfault). – Martin Prazak Dec 05 '14 at 14:59

2 Answers2

2

You can use this algorithm:

  • Collect all nums where has is true in a set<int>
  • Go through your vector<pnt> again, and remove all entries where has is false and num is present in the set<int>

Here is a sample implementation:

struct filter {
    set<int> seen;
    bool operator()(const pnt& p) {
        return !p.has && (seen.find(p.num) != seen.end());
    }
};
...
filter f;
for (vector<pnt>::const_iterator i = v.begin() ; i != v.end() ; i++) {
    if (i->has) {
        f.seen.insert(i->num);
    }
}
v.erase(remove_if(v.begin(), v.end(), f), v.end());

Demo.

Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
1

You can use std::sort and std::unique with a custom comparison predicate:

[](const pnt& a, const pnt& b) { return a.num < b.num; }

Here's a convenient way to demo it using Boost Range to reduce on typing:

Update C++03 version Live On Coliru

#include <boost/range.hpp>
#include <boost/range/algorithm.hpp>
#include <iostream>

using namespace boost;

struct pnt {
    int num;
    bool has;

    pnt(int num = 0, bool has = false) : num(num), has(has) {}

    friend bool operator<(pnt const& a, pnt const& b) { return a.num<b.num; }
    friend bool operator==(pnt const& a, pnt const& b) { return a.num==b.num; }
};

int main() {
    std::vector<pnt> v { {10,0 },{10,1 },{9,0 },{8,0 },{7,0 },{5,1 },{5,0 },{3,1 },{4,0 },{4,1 },{3,1 } };

    for (pnt p : boost::unique(boost::sort(v)))
        std::cout << "{ num:" << p.num << ", has:" << p.has << "}\n";
}

This actually has a few subtler points that make it possible to e.g. do

it = std::find(v.begin(), v.end(), 3); // find a `pnt` with `num==3`

But that's only tangentially related

sehe
  • 328,274
  • 43
  • 416
  • 565
  • @dasblinkenlight I was fixing my answer to be c++03. Note that you too are sorting his vector. In a set :) (I know that this is different, but it's not very big) – sehe Dec 05 '14 at 15:06
  • By "sorting" I meant changing the order of items in the vector itself. I am using a `set` only for the `num` portion, and store it separately from the vector itself. I wish I could use unordered_set and lambdas, but it's all C++11 :-( – Sergey Kalinichenko Dec 05 '14 at 15:11
  • @dasblinkenlight I wager that using an `unordered_set` would just (a) increase noise and support code (`std::hash`...) (b) make it slower. For the case in the OP I don't anticipate anything will be faster than the vector, even if you need to sort on a copy. But I readily admit I'm too lazy to prove it :) – sehe Dec 05 '14 at 15:12
  • @dasblinkenlight/OP found the time to add the C++03 version **[Live On Coliru](http://coliru.stacked-crooked.com/a/552a305fd18ad1dc)** – sehe Dec 05 '14 at 15:25
  • @sehe thanks for you answer, but your answer returned correct vector but sorted. – H'H Dec 05 '14 at 15:30
  • 1
    @H'H I know. It's ok if that wasn't intended. There's the other answer that shows how to do this. You could potentially find a better performance using a hybrid approach, but I don't think it matters. If it does, I think you need to reconsider the data structure anyways :) – sehe Dec 05 '14 at 15:42
  • @sehe Thanks for your answer again then :) – H'H Dec 05 '14 at 15:43