0

Coming from a C background, I'm trying to teach myself C++. I have been trying to implement the UnionFind data structure. I have a class as follows :-

class UnionFind
{
private:
    int count;
    UnionFind* parent;
    int val;

public:
    UnionFind(int _val) : count(1), parent(this), val(_val) {
        printf("UnionFind(): parent=%p this=%p\n", parent, this);
        }

    ~UnionFind() {
        printf("~UnionFind()\n");
    }

    int getcount()
    {
        if (parent == this)
            return count;
        return Find()->count;
    }

    int getval() { return val; }
    void setparent(UnionFind* _parent) { parent = _parent; }
    void setcount(int _count) { count = _count; }

    // Does a union of 2 sets
    void Union(UnionFind* u)
    {
        printf("Union(%d: this=%p, %d: u=%p)\n", val, this, u->getval(), u);
        UnionFind* this_parent = Find();
        UnionFind* u_parent = u->Find();
        printf("this_parent=%p u_parent=%p\n", this_parent, u_parent);

        if (this_parent == u_parent)
            return;

        if (this_parent->getcount() > u_parent->getcount()) {
            this_parent->setcount(this_parent->getcount() + u_parent->getcount());
            u_parent->setparent(this_parent);
        } else {
            u_parent->setcount(this_parent->getcount() + u_parent->getcount());
            this_parent->setparent(u_parent);
        }
    }

    // Returns the parent
    UnionFind* Find()
    {
        //printf("%d: Find(%p) = ", val, this);
        UnionFind* temp = parent;
        while (temp != temp->parent)
            temp = temp->parent;
        //printf("%p\n", temp);
        return temp;
    }
};

Later on I try to populate a vector with some instances of UnionFind as follows :-

vector<UnionFind> vecs;
for (int i = 0; i < N; i++)
    vecs.push_back(UnionFind(i));

Running this code shows that the UnionFind objects all have the same address. I suspected(and confirmed) that this is because the destructor of UnionFind is being called in each iteration of the loop.

My question is :-

  • I realise that I could create a vector of pointers to UnionFind objects as vector<UnionFind*> to solve this problem, but is that really the way to go? I read that unlike in C, it might be possible to solve a lot of problems in C++ without resorting to new/malloc or delete/free. What is the C++ way to solve this problem?

[EDIT]

Typo corrected.

[EDIT 2]

I'm trying to implement a disjoint-set/union-find data structure. "parent" will point to itself initially, but as unions occur parent might point to a different object.

[EDIT 3]

Added in full code.

learnlearnlearn
  • 591
  • 1
  • 3
  • 13
  • Vector does a lot of moving and shuffling of it's contents. That means that anything you put in it MUST be safe to copy and assign. It's not because `parent` is going to get messed around. Check out the [Rules of Three Five, and Zero](http://en.cppreference.com/w/cpp/language/rule_of_three). You can't do non-trivial C++ without having a grip on those rules. You also need to keep an eye on and the [Iterator invalidation rules](https://stackoverflow.com/questions/6438086/iterator-invalidation-rules) otherwise storing pointers to items in containers will cause nothing but pain. – user4581301 Apr 07 '18 at 23:05
  • Because of all that shuffling and resizing There is a lot of destruction is going on. But until `vector`'s internal storage is resized, the address of `vecs[0]` will always be the same no matter what `UnionFind` is currently occupying that slot. – user4581301 Apr 07 '18 at 23:06
  • Thanks for the links and the info, could you please state if there is a way to solve this without resorting to new/delete? – learnlearnlearn Apr 07 '18 at 23:08
  • What problem are you trying to solve exactly? Are you trying to make sure `parent == this` for every object? – Benjamin Lindley Apr 07 '18 at 23:12
  • How much do you need `parent`? With `parent`, you have a lot of book-keeping if you allow the object to be copied or moved. You are better off with the pointer so the address never changes, but [guard the pointer with a `std::unique_ptr`](http://en.cppreference.com/w/cpp/memory/unique_ptr) (`vector>`) so it's ownership is locked down and deletion is guaranteed. – user4581301 Apr 07 '18 at 23:13
  • @BenjaminLindley I've updated the question with some more details. I've updated it with the full code. – learnlearnlearn Apr 07 '18 at 23:17

2 Answers2

2

There's a few effects here, and you're misinterpreting what is going on. Creating a vector<UnionFind *> and using dynamically memory allocation will treat symptoms of your problem, but not the underlying cause.

Firstly, in each iteration of the loop,

vecs.push_back(UnionFind(i));

constructs a temporary UnionFind object, stores a copy of it in the vector, and then destroys the temporary. Although there is no guarantee, there is nothing stopping the temporaries all having the same address.

Second, when invoking the constructor with an int argument, each object stores its own address in its parent member. If the temporary objects are created at the same address, all of them will have parent member equal. Also note that the output from your program, will ONLY include information about the temporaries created by main().

Third, when the vector's push_back() appends a COPY of the object it receives, it copies the object (usually using the copy constructor). Your code has not implemented a copy constructor, so the compiler generates one automatically. The compiler-generated copy constructor (and assignment operator) copy all members by VALUE. So, when copying a UnionFind object to create another, the newly created object will have it's parent member pointing at the original object. So, assuming each object in the loop has the same address, every COPIED object will have equal parent members.

Fourth, since you've only implemented the one constructor, your output only includes information about the temporaries created in main(). You will see no information at all about the COPIES of those temporaries that are stored in the vector.

The same effect can also be seen, more simply with;

  UnionFind a(42);
  UnionFind b(a);

which is creating two objects (b as a COPY of a) but you will only see information about a being output (i.e. there are two objects, but information about only one of them is printed). If you want to see information about b as well, you will need to implement a copy constructor which does the job, something like;

UnionFind(const UnionFind &rhs) : count(rhs.count), parent(this), val(rhs.val)
{
    printf("UnionFind(): parent=%p this=%p\n", parent, this);
}

If you implement a copy constructor, it is usually necessary to implement other member functions as well. Look up the "rule of three" - a class that requires a hand-crafted copy constructor, as yours does, also often requires a hand-crafted assignment operator and a hand-crafted destructor. Also, I've over-simplified in my description above, so (in C++11 and later) look up the "rule of 5" since you will probably need to manually implement a move constructor as well to properly track objects.

More generally, however, if you are trying to learn C++, you are better off avoiding doing that by analogy with C. Techniques that are effective in C are often ineffective in C++, and vice versa. Learn C++ I/O rather than sticking with C I/O (the fact you CAN use C I/O in C++ doesn't mean it is as effective in C++ as it is in C). Avoid using malloc() and free() in C++, since they don't play will with C++ class types that have non-trivial constructors and destructors. In fact, avoid using dynamic memory allocation at all in C++ - there are, in most cases, safer alternatives.

Peter
  • 32,539
  • 3
  • 27
  • 63
1

Your union find objects don't all have the same address in the vector. You are constructing an object on the stack the stack object has the same address. Then it is being copy constructed and the copy is placed in the vector.

Where is the append function in std::vector ?

error: no member named 'append' in 'std::vector >'

Conrad Jones
  • 226
  • 3
  • 9
  • Thanks for the info, I have corrected the typo by changing append with push_back. Could you provide some more information on how to write this code without new/delete? – learnlearnlearn Apr 07 '18 at 23:05
  • @learnlearnlearn What you could do is to overload the = operator, having it set `parent` to `this`. Note that this is a workaround that works here but is not necessarily what one expects from an = operator. That said, my advice is to ditch the node representing class and instead have a class that represents the whole graph/the algorithm, storing the nodes in two vectors representing the node data and a integer typed index for the parent instead of a pointer. It's a procedural algorithm after all. Will also make the auto-update of improved UnionFind easier. – Aziuth Apr 07 '18 at 23:33