3

I try to calculate the median of a vector called median:

std::nth_element(median.begin(), median.begin() + median.size() / 2, median.end());     
medianVal = median[median.size() / 2];  
cout << "The median is " << medianVal << endl;

This works fine. But I need to get the position of the median value in its original vector. How can I do this very fast?

Vincent Savard
  • 30,767
  • 10
  • 61
  • 70
black
  • 901
  • 1
  • 15
  • 36

3 Answers3

4

I am assuming you do not want to reorder the original container. If wrong, there are easier ways.

nth_element takes a comparator.

So first create a vector of iterators into the original container, then write a comparator that takes 2 iterators, deferences them, amd compares the result.

template<class C>
auto median(C const& c){
  using std::begin; using std::end;
  auto start = begin(c);
  auto finish = end(c);
  using iterator = decltype(start);
  std::vector<iterator> working;
  for(auto it = start; it != finish; ++it)
    working.push_back(it);
  if (working.empty())
      return start;
  std::nth_element(
      begin(working), begin(working) + working.size() / 2, end(working),
      [](iterator lhs, iterator rhs){
          return *lhs < *rhs;
      }
  );
  return *(begin(working) + working.size() / 2);
}

This does rely on some C++14 (auto return type deduction), but every major compiler (except possibly icc?) has support for it now.

It is flexible enough to work on even C style arrays, and I think it even works with sentinels.

Demo

Yakk - Adam Nevraumont
  • 235,777
  • 25
  • 285
  • 465
2

According to the docs (http://en.cppreference.com/w/cpp/algorithm/nth_element) the function you're using will actually reorder the array, partially.

You would need to keep a copy of the original and step through it to find an element matching the median.

Another way to get it done is to have a vector of tuples, where the index is simply stored as the second member of the tuple. If course you'd still be stepping through the vector at some point.

Carlos
  • 5,655
  • 5
  • 41
  • 78
1

It is hard to know what you mean by "do this very fast" without knowing the exact nature of your problem or the number of elements in the data series involved, however you might like to look at the "heap median" aka "rolling median" aka "streaming median" algorithm described here, here, here and here in the SO site.

With this method, you can store the index of the current candidate median value without needing to iterate over the original array of data again to find the index of the median. You don't need to modify the order of the original container either.

Community
  • 1
  • 1
learnvst
  • 13,927
  • 13
  • 65
  • 108