1

I'm working on an AVL Tree sorting algorithm, and I thought I finally had it figured out, thanks to the help of you all, until I realized that the runtime for it was taking significantly longer than the runtime for insertion sort, which shouldn't be correct. I'm using an unsorted array (or rather, vector) of randomly generated numbers. I'll provide some statistics and code below.

AVL

for (std::vector<int>::const_iterator i = numbers.begin(); i != numbers.begin()+30000; ++i)
{
    root = insert(root, numbers[x]);
    cout << "Height: " << height(root);
    x++;
    track++;
    if( (track % 10000) == 0)
    {
        cout << track << " iterations" << endl;
        time_t now = time(0);
        cout << now - begin << " seconds" << endl;
    }

}

N = 30,000

Height = 17

Number of iterations performed = ~1,730,000

Run time for sorting = 38 seconds

Insertion Sort

for (int i = 0; i < 30000; i++)
    {
        first++;
        cout << first << " first level iterations" << endl;
        time_t now = time(0);
        cout << now - begin << " seconds" << endl;
        int tmp = dataSet[i];
        int j;
        for (j = i; i > 0 && tmp < dataSet[j - 1]; j--)
        {
            dataSet[j] = dataSet[j - 1];
        }
        dataSet[j] = tmp;
    }
}

N = 30,000

Iterations = 30,000

Run time for sorting = 4 seconds

This can't possibly be right, so I was hoping that maybe you all could help figure out what's going on? As far as I can tell, all of my code was implemented correctly, but I'm still going to include the relevant parts below in case I missed something.

Source Code

    node* newNode(int element) // helper function to return a new node with empty subtrees
{
    node* newPtr = new node;
    newPtr->data = element;
    newPtr->leftChild = NULL;
    newPtr->rightChild = NULL;
    newPtr->height = 1;
    return newPtr;
}
node* rightRotate(node* p) // function to right rotate a tree rooted at p
{
    node* child = p->leftChild;
    node* grandChild = child->rightChild;

    // perform the rotation
    child->rightChild = p;
    p->leftChild = grandChild;

    // update the height for the nodes
    p->height = max(height(p->leftChild), height(p->rightChild)) + 1;
    child->height = max(height(child->leftChild), height(child->rightChild)) + 1;

    // return new root
    return child;
}
node* leftRotate(node* p) // function to left rotate a tree rooted at p
{
    node* child = p->rightChild;
    node* grandChild = child->leftChild;

    // perform the rotation
    child->leftChild = p;
    p->rightChild = grandChild;

    // update heights
    p->height = max(height(p->leftChild), height(p->rightChild)) + 1;

    // return new root
    return child;
}

int getBalance(node *p)
{
    if(p == NULL)
        return 0;
    else
        return height(p->leftChild) - height(p->rightChild);
}
// recursive version of BST insert to insert the element in a sub tree rooted with root
// which returns new root of subtree
node* insert(node*& n, int element)
{
    // perform the normal BST insertion
    if(n == NULL) // if the tree is empty
        return(newNode(element));
    if(element< n->data)
    {
        n->leftChild = insert(n->leftChild, element);
    }
    else
    {
        n->rightChild = insert(n->rightChild, element);
    }

    // update the height for this node
    n->height = 1 + max(height(n->leftChild), height(n->rightChild));

    // get the balance factor to see if the tree is unbalanced
    int balance = getBalance(n);

    // the tree is unbalanced, there are 4 different types of rotation to make
    // Single Right Rotation (Left Left Case)
    if(balance > 1 && element < n->leftChild->data)
    {
        return rightRotate(n);
    }
    // Single Left Rotation (Right Right Case)
    if(balance < -1 && element > n->rightChild->data)
    {
        return leftRotate(n);
    }
    // Left Right Rotation
    if(balance > 1 && element > n->leftChild->data)
    {
        n->leftChild = leftRotate(n->leftChild);
        return rightRotate(n);
    }
    // Right Left Rotation
    if(balance < -1 && element < n->rightChild->data)
    {
        n->rightChild = rightRotate(n->rightChild);
        return leftRotate(n);
    }
    // cout << "Height: " << n->height << endl;
    // return the unmodified root pointer in the case that the tree does not become unbalanced
    return n;
}
yms
  • 10,080
  • 3
  • 37
  • 66
Sean Madson
  • 41
  • 1
  • 6
  • 1
    Since insertion sort is really simple and AVL trees are not, even before trying anything it is reasonable to suspect that AVL-sort has a higher constant factor associated with it. – harold Apr 21 '17 at 19:53
  • @harold top of that he is calling height(root) in the middle of the loop, which increases even more that constant factor. – yms Apr 21 '17 at 20:17
  • I'm not sure I follow what you mean. Could you elaborate? – Sean Madson Apr 21 '17 at 21:16

1 Answers1

1

The best way to "debug" this kind of problems is to use a performance profiler tool, however in this case I think I can give you a good hypothesis:

In summary, your comparison is not "fair".

How to "fix" this?:

  • Experiment with memory management libraries, like Boost.Pool.

As an example, if you know in advance how many nodes your tree will have, you can allocate all the required nodes at once at the beginning of your algorithm, and create a pool of nodes with them (if you use Boost this should be a lot faster than calling the standard new operator every time, one by one).

Every time you need a new node, you take it from the pool. You can then "compare" the algorithms from the point where no additional memory allocations will be required.

Community
  • 1
  • 1
yms
  • 10,080
  • 3
  • 37
  • 66
  • So wait, you're saying that there might not even be anything wrong with my algorithm? Also, what would I be analyzing its performance for, exactly? Would it be allocations or CPU usage? Or something else? – Sean Madson Apr 21 '17 at 19:32
  • "So wait, you're saying that there might not even be anything wrong with my algorithm?" Yep, that's my hypothesis. – yms Apr 21 '17 at 19:44
  • "what would I be analyzing its performance for, exactly? Would it be allocations or CPU usage?" CPU usage, you want to see where in your code the CPU spends most of its execution time. My guess is that it will be the constructor of the Node class or something along this line. – yms Apr 21 '17 at 19:47
  • How would I go about creating a pool of nodes that shrinks in size each time I pull a node from it? – Sean Madson Apr 21 '17 at 19:52
  • Wouldn't it be node* pool[] instead, since you have to declare the datatype before declaring the array? – Sean Madson Apr 23 '17 at 01:28
  • Rewriting my comment since I cannot edit it: A simple implementation could be to use an array of nodes in the stack with an index of the last node used. I am assuming that you know in advance how many nodes your program will need. For example: 'node myPool[] = new node[poolSize];'. Note that I am not using explicit pointers there, and [this is on purpose](http://stackoverflow.com/questions/333443/c-object-instantiation) – yms Apr 24 '17 at 16:41