19

Here's a breakdown on the union/find algorithm for disjoint set forests on wikipedia:

  • Barebone disjoint-set forests... (O(n))
    • ... with union by rank ... (now improved to O(log(n))
      • ... with path compression (now improved to O(a(n)), effectively O(1))

Implementing union by rank necessitates that each node keeps a rank field for comparison purposes. My question is, is union by rank worth this additional space? What happens if I skip union by rank and just do path compression instead? Is it good enough? What is the amortized complexity now?


A comment is made that implies that union by rank without path compression (amortized O(log(n) complexity) is sufficient for most practical application. This is correct. What I'm asking is the other way around: what if you skip union by rank and ONLY do path compression instead?

In a sense, path compression is an extra step to improve union by rank, and that's why that extra step can be omitted without disastrous consequence. But is union by rank a necessary intermediate step to path compression? Can I skip it and go straight to path compression, or will that be catastrophic?


It was also pointed out that without union by rank, repeated unions could create a linked-list like structure. This means that a single path compression operation could take O(n) in the worst case. This would of course affect future operations, so how this plays out when amortized over many operations is what I'm more interested in.

polygenelubricants
  • 348,637
  • 121
  • 546
  • 611
  • My favorite paper on this: Yossi Shiloach, Uzi Vishkin: An O(log n) Parallel Connectivity Algorithm. J. Algorithms 3(1): 57-67 (1982) – Chad Brewbaker Jul 02 '10 at 06:12
  • On the other hand, even if a *future* single path compression operation could potentially take the worst case `O(n)` time, that worst-case path will become compressed because of that operation, so it is not the case that the same worst case time can occur repeatedly - at least not on the same path or any part of it. Therefore the analysis for worst-case repeated operation might be different from the worst-case single operation, I guess? – rwong Mar 25 '15 at 02:38

2 Answers2

7

I googled for "without union by rank" and the second link that came up was this one:

...We close this section with an analysis of union–find with path compression but without union by rank...

The union-find datastructure with path compression but without union by rank processes m find and n-1 link operations in time O((m+n) log n)

jkff
  • 16,670
  • 3
  • 46
  • 79
  • That is a great find! I admit that I didn't Google it first because I didn't think anyone would've thought of addressing it. This means that the amortized cost per operation *IS* `O(log n)`, and without the additional cost in space to keep track of the rank! – polygenelubricants Feb 24 '10 at 14:41
  • @polygenelubricants: An even cool trick you can do is if you **randomly** assign parents instead of by rank [Source](http://www.cis.upenn.edu/~sanjeev/papers/soda14_disjoint_set_union.pdf), you will still get `O(an)` instead of `O(log n)` without rank tracking. Its just 1 extra line of code :) – pathikrit Jun 06 '16 at 20:56
  • 1
    What is n and what is m, please? – Henry Nov 25 '20 at 16:57
  • 1
    "m find and n-1 link operations" – jkff Nov 26 '20 at 17:25
3

Path compression flattens the tree structure. Union by rank helps to merge. Assume that you skip the latter. So now, you have a forest with no rank information to choose how to merge. Potentially, you now run the risk of merging a tree with a larger depth to one with a smaller depth -- leading to an unbalanced tree structure. In the worst case, you may end up with a linked list. Your Union's amortized time complexity increases even if that for Find remains the same.

IMO, It'd be better to skip path compression but not rank.

dirkgently
  • 101,474
  • 16
  • 123
  • 183
  • 1
    I agree with what you said, but it would help if someone can do the rigorous analysis to show what the performance is with path compression but not union by rank. Unfortunately I'm not familiar with the techniques involved to do such analysis. I have no clue how inverse Ackermann function play a role in the complete implementation, for example. – polygenelubricants Feb 24 '10 at 03:24