4

I want to repeatedly rearrange an array or std::vector so that the minimum is the first element, the maximum is the last element, and arr[(0+lastIdx)/2] would be the median, elements before the median is less than median, elements after the median would be greater. After each time I query for the min, max and median, I will make changes to the data and I want to quickly query those three values again.

Every time I want to rearrange the array, the array is a different array with same size.

Using std::nth_element I can get the median in the right place, and then I could iterate the array to get min and max. For a single array, this achieves O(n) complexity, and clearly this cannot be improved upon. (Except perhaps, the constant of complexity in front of O(n))

I need to operate on an array, firstly, I rearrange the array, and then do something else, this would make the arranged array totally unarranged again, but no new values are inserted. then, I repeat this procedure over and over again.

Aaron McDaid
  • 24,484
  • 9
  • 56
  • 82
Alaya
  • 2,895
  • 3
  • 20
  • 35
  • How big is the array? How many times is the functionality performed? Is there any benefit to optimizing the algorithm? – Thomas Matthews Apr 24 '15 at 14:49
  • 3
    So basically... you want to sort an array. – David says Reinstate Monica Apr 24 '15 at 14:49
  • 2
    @DavidGrinberg No, he doesn't want to sort, because it can be done faster than sorting (sorting can be done in `O(n log n)`, finding min/max/median can be done in `O(n)` or even faster (don't know how) – FalconUA Apr 24 '15 at 14:50
  • there would be about millions of `struct{double,double,double}` element in the array. @ThomasMatthews – Alaya Apr 24 '15 at 14:51
  • 1
    @FalconUA, I think `O(n)` might just be the optimal complexity, I just wonder if I could get a better constant factor. – Alaya Apr 24 '15 at 14:52
  • Do you already know the median value ? – Othman Benchekroun Apr 24 '15 at 14:53
  • @DavidGrinberg Why is that ? If you need an apple you don't have to cut a tree – Othman Benchekroun Apr 24 '15 at 14:54
  • @FalconUA Yes you can find the min, max and median in at least O(n) but the OP also wants to arrange the elements in the container to where all elements before the mean are less than the mean all all elements greater than the mean above the mean which implies a sort and o(n log(n)) is the best you can do on that. – NathanOliver Apr 24 '15 at 14:57
  • @Othman But he need an apple, a branch, a trunk, some leaves and some roots. All thats missing is a bit of bark. The only difference between what he wants and a sorted array is that elements 1-(size/2) and (size/2)-size are only partially sorted. – David says Reinstate Monica Apr 24 '15 at 14:57
  • @FalconUA But he wants more than that, he also wants to partially sort items such that items less than the mean are to the left, and more than are to the right. Its 99% of sorting. I dont see any way this can be done without essentially sorting the array. – David says Reinstate Monica Apr 24 '15 at 14:58
  • @NathanOliver, the `std::nth_element` could rearrange the array in `O(n)` complexity and make sure that element before the nth element is less thant the nth element, element after nth element is greater. so I can use that function to rearrange the array. – Alaya Apr 24 '15 at 14:59
  • @Alaya, the algorithm of the QuickSort uses this, in each step it places a element in his right place, maybe apply this part of the algorithm to the minimum, maximum and median – Othman Benchekroun Apr 24 '15 at 15:01
  • @DavidGrinberg He doesn't need to sort it anyway ... He already can do better – Othman Benchekroun Apr 24 '15 at 15:04
  • @Othman He doesn't need to sort it, but he needs to get 90% of the way there. – David says Reinstate Monica Apr 24 '15 at 15:05
  • 1
    All the rearrangements would happen in the same array (with some added or removed elements)? How often? – Juan Lopes Apr 24 '15 at 15:06
  • @DavidGrinberg This is not about percentage, it's about complexity, O(n) and O(nlogn) are not quite the same ... – Othman Benchekroun Apr 24 '15 at 15:07
  • @Othman And I'm saying that I don't believe its possible to get this on O(n) because of the amount of work required. – David says Reinstate Monica Apr 24 '15 at 15:12
  • When you say the rearrangement would happen a lot of times, is this on the same array? Do you need an algorithm that exploits the work done by previous calls? Could you track the changes as you insert and remove elements? – sh1 Apr 24 '15 at 15:18
  • @sh1 every time I want to rearrange the array, the array is a different array with same size. – Alaya Apr 24 '15 at 15:21
  • Do you know anything interesting about the distribution of the data? Or the order in which it might arrive? Should previous medians correlate with the next median? Does the answer need to be exact? – sh1 Apr 24 '15 at 15:25
  • "because the rearrangement would happen a lot of times and would be the bottleneck of performance." *Please edit the question to clarify this.* Are you implying that you will need to repeat this procedure many times, while making (small) changes to the array? If so, this completely changes your question – Aaron McDaid Apr 24 '15 at 15:45
  • @AaronMcDaid , Yes I need to operate on an array, firstly, I rearrange the array, and then do something else, this would make the arranged array totally unarranged again, but no new values are inserted. then, I repeat this procedure over and over again. – Alaya Apr 24 '15 at 15:48
  • I've made some edits to the question. It really is important to make readable changes directly to the question. Comments are not intended to provide clarification, the clarification should go into the question. – Aaron McDaid Apr 24 '15 at 15:54
  • "and then do something else, this would make the arranged array totally unarranged again". Sorry, but this is quite confusing. You need to tell us much more. If you simply replace all the values with new (unsorted) values, then there is nothing to be done to speed it up. – Aaron McDaid Apr 24 '15 at 15:58
  • when you say the array is different you mean it has been shuffled by something else correct? – UmNyobe Apr 24 '15 at 16:02
  • @UmNyobe, yes, that is right. – Alaya Apr 24 '15 at 16:48
  • 1
    If it's only been shuffled then min, max, and median won't have changed, which could simplify the problem. – sh1 Apr 24 '15 at 18:18

5 Answers5

0

Even if you somehow manage to cut the cost of locating min, max, median to zero, You still need to put the misplaced elements below or above the median. Which means worst case of n/2 - 1 permutations each time.

  1. First pass, find the minimum and its position (O(n) time, O(1) space)
  2. Second pass, find the maximum and its position (O(n) time, O(1) space)
  3. Third pass, find the median and its position using the median of median algorithm (O(n) time, O(1) space)
  4. Put min, max, median at their respective positions (0, n-1, n/2) by swapping
  5. Now you have two indexes: One for below starting at 0, and one for above starting at n-1. While the current below element is where it should be, increment that index.While the current above element is where it should be, decrement that index. When misplaced elements are founds, swap below and above. Repeat as long as below < above. (O(n) time, O(1) space)

Logic of step 4, 5 : The number of elements which are below the median when they should be above is exactly the number of elements which are above and should be below.

Of course you can merge pass 1, 2 ,3 in one function, but that doesnt affect complexity

Let's run with :

{3, 7, 9, 6, 8, 1, 4, 5, 2}

pass 1, 2, 3 : min_pos = 5, max_pos = 2, median_pos = 7, median = 5

swap (0, min_pos)    -> {1, 7, 9, 6, 8, 3, 4, 5, 2}
swap (9, max_pos)    -> {1, 7, 2, 6, 8, 3, 4, 5, 9}
swap (4, median_pos) -> {1, 7, 2, 6, 5, 3, 4, 8, 9}

Now below_pos = 0 above_pos = 8. Elements are not misplaced. Next misplaced below is 7 at position 1. Next misplaced above is 4 at position 6.

swap (1, 6) -> {1, 4, 2, 6, 5, 3, 7, 8, 9}

Next mispaced below is 6 at pos 3. Next misplaced is 3 at position 5.

swap (3, 5) -> {1, 4, 2, 3, 5, 6, 7, 8, 9}

And the algorithm output

{1, 4, 2, 3, 5, 6, 7, 8, 9}

Community
  • 1
  • 1
UmNyobe
  • 21,341
  • 8
  • 52
  • 85
  • Wouldn't you do steps one and two simultaneously? – NathanOliver Apr 24 '15 at 15:13
  • 1
    @NathanOliver It is the same in terms of complexity. – Samy Arous Apr 24 '15 at 15:14
  • 1
    @UmNyobe How would you keep track of the median after each insertions? and what about the min-max if they are deleted? Or would you have to rebuilt the index in these cases? – Samy Arous Apr 24 '15 at 15:14
  • given that min, max and median are where they should be in a sorted sequence, you will never swap with their index in step 5. Hope I understood correctly. – UmNyobe Apr 24 '15 at 15:19
  • @NathanOliver It is actually O(2n) in both cases. You need to do 2n comparisons in both cases. – Samy Arous Apr 24 '15 at 15:19
  • @UmNyobe when you insert a number, you check if it is smaller than the median and you put it on the lower half ( median position is still correct ). Then you insert a second number which is also smaller than the current median, but now, the median is no longer valid ( it has shifted twice ) and you will need to search for a valid median in the lower half. or am I missing something here? – Samy Arous Apr 24 '15 at 15:23
  • Step 1 and 2 both say minimum, one should be maximum – Sami Kuhmonen Apr 24 '15 at 16:07
  • Finding minimum and maximum together has smaller constant than separately. (3/2 instead of 2.) – Tacet Apr 25 '15 at 13:21
0

You can do it in something like O (log n sqrt (n)) amortized time if you make many changes and many lookups.

Initially, you sort the array. Then you extract the sqrt (n) smallest, the sort (n) largest, and the sqrt (n) values around the median. So you will know "there are n1 elements <= x1" (where n1 is about sqrt (n) and x1 is the n1-smallest element), "there are n2 elements <= x2" (where n2 is about n/2 - sqrt (n)/n, and x2 is the n2-smallest element), "there are n3 elements <= x3" (where n3 is about n/2 + sqrt (n)/n, and x3 is the n3-smallest element) and "there are n4 elements <= x4" where n4 is close to n - sqrt (n) and n4 is the x4-smallest element. You keep track of the elements number 0 to n1, number n2 to n3, and number n4 to n-1.

To get the minimum, maximum, or median, you need to examine about sqrt (n) elements.

When you make a change to an array element, or add or insert an element, you adjust the values n1, n2, n3 and n4 according to what has happened, and the lists of numbers 0 to n1, n2 to n3, and n4 to n-1. After making too many changes, the numbers n1, n2, n3, n4 will be changed so that either the search for min, max, medium takes too long or isn't of any use anymore. When that happens, you sort the data again.

BTW. I think the idea with heaps will work better.

gnasher729
  • 47,695
  • 5
  • 65
  • 91
-1

If you have A LOT of memory, you can use two heaps to store the median and two variables to store the min/max:

Pros:

  • Minimum and Maximum can be update in O(1);
  • Median can be updated in O(log N);

Cons:

  • You will need additional memory, N for two heaps (N/2 each) and N for your vector - 2N memory instead of N in your O(N) solution.

There is a built-in STL function in C++ that helps you to build and update the heaps easily: std::make_heap, std::push_heap and std::pop_heap.

More details of how to use two heaps for finding median is described here.

Well, you can also get more tricky - use two vectors to save your data. In that case, you'll need only N memory, but your data will be splitted into two vectors.

Community
  • 1
  • 1
FalconUA
  • 7,927
  • 5
  • 28
  • 61
  • I think you need to explain how – Othman Benchekroun Apr 24 '15 at 15:05
  • 1
    You don't actually need additional memory for the two heaps, you can do this using the same input array, storing the lower heap in the beginning and the upper heap in the end. – Juan Lopes Apr 24 '15 at 15:08
  • This doesn't actually solve all of the OPs problems - he still need an array where elements `1` through `(size/2)` are less than the median and elements `(size/2)` through `size` are more. – David says Reinstate Monica Apr 24 '15 at 15:13
  • @DavidGrinberg Yeah, he will need A LOT of additional memory, or can get a little bit tricky and use those heaps to store elements instead of storing in one array. – FalconUA Apr 24 '15 at 15:15
  • The link I provide explains how you can keep track of the median using a min and a max heap which should cover all OPs problems – Samy Arous Apr 24 '15 at 15:18
-3

O(n) seems to be the best you can do to arrange the array and the way you do it seems to be fine.

In your last paragraph you state that you do something with the array so that it is in a random order again, but no new values are added. Maybe instead of creating a heap, as some suggested, would it not be sufficient to just make a copy of the arranged array?

Pro: you have to arrange the array only once.

Con: you need twice the memory.

the
  • 323
  • 3
  • 8
  • Read the question properly, the asker also provide his solution in O(n), much faster than `std::sort` – FalconUA Apr 24 '15 at 14:55
  • No, I just want the min/max/median in the right place, using `std::nth_element`, I already can finish the job in O(n) while sorting the array would take `O(nlgn)` time. – Alaya Apr 24 '15 at 14:56
  • So, you already know the min, max and median? The problem is just, to get the elements in the right order? – the Apr 24 '15 at 15:00
  • `have to sort the vector` no you don't – UmNyobe Apr 24 '15 at 15:01
-3

Definitely you can not do it faster that in O(N), because you have to look through all elements even to find the minimum.

You can, of course, talk about optimizing the code within O(N) complexity (that is, optimizing the constant before N).

Or will you be doing the same operation several times on a slightly modified array?

Petr
  • 9,051
  • 1
  • 25
  • 47
  • You can do better that O(N) using a heap. Yes, you would need to look through all the elements once. but then each insertion is only (logN) and retrieval is O(1). Of course, it depends if insertions are more frequent than reads or the other way around, but it is possible! – Samy Arous Apr 24 '15 at 15:10
  • @SamyArous, the question author has never said anything about insertions or retrievals. What he asked is given an array rearrange it. You can not do in faster than O(N). – Petr Apr 24 '15 at 15:12
  • @SamyArous if he needs to run it several times on slightly modified array, that's what my last paragraph is about. – Petr Apr 24 '15 at 15:13
  • Yes he basically said they will need to do the rearrangement many times which for me implies insertion + deletion. Missed your last paragraph, sorry about that. – Samy Arous Apr 24 '15 at 15:16