6

I noticed on the AVL Tree Wikipedia page the following comment:

"If each node additionally records the size of its subtree (including itself and its descendants), then the nodes can be retrieved by index in O(log n) time as well."

I've googled and have found a few places mentioning accessing by index but can't seem to find an explanation of the algorithm one would write.

Many thanks

[UPDATE] Thanks people. If found @templatetypedef answer combined with one of @user448810 links to particularly help. Especially this snipit:

"The key to both these functions is that the index of a node is the size of its left child. As long as we are descending a tree via its left child, we just take the index of the node. But when we have to move down the tree via its right child, we have to adjust the size to include the half of the tree that we have excluded."

Because my implementation is immutable I didn't need to do any additional work when rebalancing as each node calculates it's size on construction (same as the scheme impl linked)

My final implementation ended up being:

class Node<K,V> implements AVLTree<K,V> { ...
    public V index(int i) {
        if (left.size() == i) return value;
        if (i < left.size()) return left.index(i);
        return right.index(i - left.size() - 1);
    }
}

class Empty<K,V> implements AVLTree<K,V> { ...
    public V index(int i) { throw new IndexOutOfBoundsException();}
}

Which is slightly different from the other implementations, let me know if you think I have a bug!

1 Answers1

11

The general idea behind this construction is to take an existing BST and augment each node by storing the number of nodes in the left subtree. Once you have done this, you can look up the nth node in the tree by using the following recursive algorithm:

  • To look up the nth element in a BST whose root node has k elements in its left subtree:
    • If k = n, return the root node (since this is the zeroth node in the tree)
    • If n ≤ k, recursively look up the nth element in the left subtree.
    • Otherwise, look up the (n - k - 1)st element in the right subtree.

This takes time O(h), where h is the height of the tree. In an AVL tree, this O(log n). In CLRS, this construction is explored as applied to red/black trees, and they call such trees "order statistic trees."

You have to put in some extra logic during tree rotations to adjust the cached number of elements in the left subtree, but this is not particularly difficult.

Hope this helps!

ndmeiri
  • 4,728
  • 12
  • 32
  • 41
templatetypedef
  • 328,018
  • 92
  • 813
  • 992
  • While I agree that this is the right way to do things, it doesn't seem to be quite the same as the Wiki page is covering (which stores the size of the entire sub-tree, not the left sub-tree). What they're doing would (apparently) require a little clumsiness, deriving the size of the left subtree by subtracting the size of the right subtree or looking at the left subtree to find its size. – Jerry Coffin Jun 08 '12 at 21:08
  • @JerryCoffin- The approach I've sketched out is connected to this approach. If every subtree stores its total size, I can look up the size of the left subtree (if any) by just looking at the left child's total size. The two approaches are essentially isomorphic to one another. – templatetypedef Jun 08 '12 at 21:10
  • 3
    Thinking for another second, though, I see one advantage to their system: it supports indexing from either end, so indexing N from the end of the collection is just as easy as N from the start. – Jerry Coffin Jun 08 '12 at 21:14
  • 2
    This answer is correct. Look [here](http://programmingpraxis.codepad.org/pwYpvDFw") for a sample implementation in Scheme, and [here](http://programmingpraxis.com/2011/11/25/avl-trees/)/[here](http://programmingpraxis.com/2011/11/29/avl-trees-extended/) for a two-part explanation, including nth (find the nth key) and rank (determine the rank of a given key) functions. – user448810 Jun 08 '12 at 23:44