2

How to calculate the median of a digital number array has been discussed before. For example, you can refer to What is the right approach when using STL container for median calculation?. Now I have a different question, and that is how can you get the index of the median in the original STL container. In order to illustrate my question, I give an example:

vector<int> myarray;
myarray.push_back(3);
myarray.push_back(1);
myarray.push_back(100);
myarray.push_back( 20);
myarray.push_back(200);
int n = myarray.size()/2;
nth_element(myarray.begin(), myarray.begin()+n, myarray.end());
int median = myarray[n];

In the above codes I can get the median value but I can not get its index in the original vector array (4). Any ideas? Thanks!

Community
  • 1
  • 1
feelfree
  • 9,501
  • 18
  • 76
  • 143
  • 1
    Why do you assume that the median is one of the elements in the vector? – Eitan T Jul 19 '12 at 07:59
  • `nth_element` gives you an iterator to the median if used correctly(assuming an odd-length array). With the iterator and `std::distance` you get what you want. See my answer below. – juanchopanza Jul 19 '12 at 08:11
  • @EitanT Here I just give an example where the number of elements is odd. Extension to the case where the number of elements are even is straightforward. – feelfree Jul 19 '12 at 08:43
  • @ juanchopanza Thanks for your suggestion any way – feelfree Jul 19 '12 at 08:44

4 Answers4

6

I think there is no straight-forward way to do that.

The vector that you sorted has changed its order, so that searching in that will always return n.

You need to save a copy of your original vector, and search in that. Keep in mind that if the original vector contained duplicates, you will not know exactly which of them was actually put to position n (if this is of any relevance for you).

As an alternative, you could have a look at the implementation of nth_element, and implement your own version that also reports the original position of the found n-th element.

Timbo
  • 25,571
  • 10
  • 45
  • 70
4

If it is accapteble to search the element

vector<int>::iterator itOfMedian = std::find(myarray.begin(), myarray.end(), median);
int index = itOfMedian - myarray.begin();

should do the trick.

EDIT

seems you have point here. nth_element sorts its argument vector... Therefore

vector<int> myArrayCopy = myarray;
// find median in myArrayCopy
vector<int>::iterator itOfMedian = std::find(myarray.begin(), myarray.end(), median);
int index = itOfMedian - myarray.begin();
juanchopanza
  • 210,243
  • 27
  • 363
  • 452
Matthias
  • 3,228
  • 2
  • 23
  • 40
  • No need to use find. `nth_element` gives an iterator to the element. OP is just not getting that iterator back. – juanchopanza Jul 19 '12 at 08:12
  • I have tried, but failed. The reason is because invoking nth_element function can also change the myarray vector. – feelfree Jul 19 '12 at 08:17
  • 1
    @feelfree sorry, I misread the question. If you want the index in the original vector, then do as above, but using a copy of the original before the call to nth_element. – juanchopanza Jul 19 '12 at 08:28
3

You can use std::nth_element to find an iterator to the median element. However, this does a partial sorting of the vector, so you would need to use a copy:

  std::vector<int> dataCopy = myarray;
  // we will use iterator middle later
  std::vector<int>::iterator middle = dataCopy.begin() + (dataCopy.size() / 2);
  // this sets iterator middle to the median element
  std::nth_element(dataCopy.begin(), middle, dataCopy.end());
  int nthValue = *middle;

Now it gets complicated. You have a value corresponding to the median. You can search the original vector for it, and use std::distance to get the index:

std::vector<int>::iterator it = std::find(myarray.begin(), myarray.end(), nthValue);
std::vector<int>::size_type pos = std::distance(myarray.begin(), it);

however, this only works if there are not duplicates of nthValue in myarray.

juanchopanza
  • 210,243
  • 27
  • 363
  • 452
  • std::distance(data.begin(), middle) will always be (data.size() / 2). Calling nth_element doesn't change the iterators, it changes the data. – Timbo Jul 19 '12 at 08:23
1

Sorry to dig up an old topic, but here's a nice way to do it. Exploit the fact that nth_element will sort a pair by the first element; with this in mind, create a vector of pairs where the first part of the pair is value to participate in median calculation, and second is index. Modifying your example:

vector<pair<unsigned int, size_t>> myarray;
myarray.push_back(pair<unsigned int, size_t>(  3, 0));
myarray.push_back(pair<unsigned int, size_t>(  1, 1));
myarray.push_back(pair<unsigned int, size_t>(100, 2));
myarray.push_back(pair<unsigned int, size_t>( 20, 3));
myarray.push_back(pair<unsigned int, size_t>(200, 4));

int n = myarray.size()/2;
nth_element(myarray.begin(), myarray.begin()+n, myarray.end());

int median = myarray[n].first;
int medianindex = myarray[n].second;

Of course myarray has been rearranged, and so myarray[medianindex] is not the median. If you made a copy before nth_element, medianindex would be the desired index.

nonbasketless
  • 31
  • 1
  • 5
  • I don' t see the point in storing the indices if you can get the median index in O(1) with methods shown in other answers. – juanchopanza Nov 25 '17 at 12:49
  • As of this writing none of the other methods give an O(1) method for searching the median index. – nonbasketless Sep 30 '19 at 17:42
  • ...no they don't. Your example for example relies on std::find, which has linear complexity. – nonbasketless Sep 30 '19 at 20:24
  • Heh, no probs :). My method trades that final O(n) search for a trivial O(1) operation at the expense of allocating and moving O(n) extra memory around. Whether it's better might be system dependent, but I'd guess the other methods are still faster. I like it because it's really simple. It does have the potential benefit of selecting the middle-most element if there are duplicates (because comparison on a pair works on .first, then .second). – nonbasketless Oct 03 '19 at 18:18