Quick select with repeat values

Question

Is it possible to perform searching kth elment in O(n) over multiset (values can repeat)?

Because as far as I understand the idea of quick select I have to partition input using some pivot. Then I have 2 arrays, which I choose for recursive searching depends on which index element I'm searching for + what are size of both arrays for instance:

1 7 8 5 3 2 4

Let's say pivot is 4 I'm searching second greatest element. So after partitioning I might get order like

1 3 2 4 7 8 5

Because right sub array consist of 3 elements I will still try to find second greatest in right array, if I'm correct?

But if I would take 8 as a pivot I might get something like

1 3 2 7 5 4 8

and therefore I will try to find greatest element within left table (propably by linear, but in general I will take left subarray and search for element - (|right subarray size| + 1))

But what about multisets? Let's say I have array:

4 5 6 7 7 7 4 3 2 1

and my pivot is 6 searching 3rd greatest element, after partition I receive:

4 5 3 2 4 1 6 7 7 7

so if I use approach that presented above I will try to perform recursive on right subarray while it's obvious third greatest value is 5 which is on left?

Only solution I came up with is use some data structure like BST, Set, etc. to O(nlogn) filter out repetitions. And then use O(n) quick select. However in total it would gave me non-linear approach, can this be done linear?

I have also an extra question, what if allocating memory cannot be done? And what I can do is only use local ints + stack recursion. The problem can be solved in O(n)? Because O(nlogn) can be done by sorting + linear "go through counting".

score 4 · Answer 1 · answered Jan 10 '13 at 22:56

4

I think this depends on your interpretation of "kth largest element." If by "kth largest element" you mean "the element that would be at position k within the array if it were sorted," then quickselect will work without modifications.

If, on the other hand, you mean "the kth largest of the distinct values in the array," then you are correct that an unmodified quickselect will not work correctly, as your example shows. However, you can modify the algorithm so that it works in expected O(n) time by adding all the elements to a hash table, then iterating over the hash table to get one copy of each distinct value. From there, you could use the normal quickselect algorithm on that generated array, which would require a total of O(n) time on expectation.

Hope this helps!

answered Jan 10 '13 at 22:56

templatetypedef

328,018
92
813
992

To be honest I haven't think about HT. But It should works. Anyway I'm also waiting for other suggestions. If there any :) – abc Jan 10 '13 at 23:01
I think of to change problem a bit. What would be fastest approach when allocating extra memory is not a valid operation. I can use only begin array with elements (can alloc only O(1) memory + recursive stack). Is that can be done in nlogn or n. Restriction implicate that I can't use structures. – abc Jan 10 '13 at 23:09
Well I answered myself that O(nlogn) can be done by simply sorting it + find how many neighbours values change in linear time. So it was dummy question. But still is combination: no extra memory + O(n) is reachable? – abc Jan 10 '13 at 23:26

Quick select with repeat values

1 Answers1