2

I am studying algorithms on my own and turned to the Open Data Structures (C++ Edition) free ebook as a reference. In my attempt to master the topic, I am determined to finish all the challenges in the book. However, I am have a lot of trouble understanding how one particular unpopular algorithm could have O(1) for add() and remove.

One of the challenges is to create a random queue with these properties (exercises).

Exercise 2.2. Design and implement a RandomQueue. This is an implementation of the Queue interface in which the remove() operation removes an element that is chosen uniformly at random among all the elements currently in the queue. (Think of a RandomQueue as a bag in which we can add elements or reach in and blindly remove some random element.) The add(x) and remove() operations in a RandomQueue should run in constant time per operation.

The chapter deals with array-backed-list so the addition and removal of elements is rather trivial in that sense. However, the array sometimes has to be recreated for size. You are suppose to copy the old array to the new one. This essentially could be O(n). Also I believe that I need to utilize a circular array. So essentially I would have to shift indices within the array producing O(n-1) for time complexity.

I am very confused about how to calculate and measure these algorithms. The book does talk about O(m) but it is rather vague at times.

Theorem 2.2. An ArrayQueue implements the (FIFO) Queue interface. Ignoring the cost of calls to resize(), an ArrayQueue supports the operations add(x) and remove() in O(1) time per operation. Furthermore, beginning with an empty ArrayQueue, any sequence of m add(i, x) and remove(i) operations results in a total of O(m) time spent during all calls to resize().

Like how can you just ignore that, I am not making the connection on how you can just lop off that portion of the time complexity?

Aaron
  • 101
  • 2
  • 11
  • 1
    I think you want to take a look at Amortized O(1): http://stackoverflow.com/questions/200384/constant-amortized-time . Even if I'm wrong, good reading. – user4581301 Jan 19 '17 at 01:04

1 Answers1

1

The resize is usually scheduled to happen at exponentially distributed sizes. Say, when size is 8, get a size-16 storage, move the contents there and drop the old size-8 storage. When that fills up, get a size-32 storage, and so on. This way, the amortized complexity of inserting n elements is still O(n): in total, we moved the elements n + n/2 + n/4 + n/8 + ... times which is bounded by 2n and therefore linear.

The same can be done if resizing is needed when the structure shrinks: when a size-16 structure contains as little as 4 elements, get a size-8 storage, move the contents there and drop the old size-16 storage.

The powers of two can be substituted by powers of some other number, for example, 1.5. The tradeoff is how much memory is wasted versus how much time is spent moving the contents. Both are O(n), but we can raise the constant factor of one and lower the other.

Perhaps the book does not want to deal with amortized complexity just yet, and so avoids discussing the details of memory management.

Gassa
  • 7,432
  • 3
  • 24
  • 43
  • So even though these methods have amortized complexity they can still be considered 0(1)? – Aaron Jan 19 '17 at 00:23
  • @Aaron The statement which is true is this: `n` insertions are done in `O(n)` total operations. We then say that each insertion has an _amortized_ cost of `O(1)`. Here is what we mean exactly by that: one individual insertion can still be as expensive as `O(n)` when the resize occurs, but such peaks are rare and become amortized among many other cheap insertions. – Gassa Jan 19 '17 at 00:27
  • 3
    If allocation itself is considered constant time, then these kinds of array lists can be deamortized to provide real O(1) costs. You allocate a new array when the old one fills up and then incrementally move items from the old one to the new one as new items are inserted. This trick is often used in the literature to let the author prove the target complexity using fixed-size data structures. – Matt Timmermans Jan 19 '17 at 00:33