37

I'm a programming student, and for a project I'm working on, on of the things I have to do is compute the median value of a vector of int values. I'm to do this using only the sort function from the STL and vector member functions such as .begin(), .end(), and .size().

I'm also supposed to make sure I find the median whether the vector has an odd number of values or an even number of values.

And I'm Stuck, below I have included my attempt. So where am I going wrong? I would appreciate if you would be willing to give me some pointers or resources to get going in the right direction.

Code:

int CalcMHWScore(const vector<int>& hWScores)
{
     const int DIVISOR = 2;
     double median;
     sort(hWScores.begin(), hWScores.end());
     if ((hWScores.size() % DIVISOR) == 0)
     {
         median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);
     }
     else 
     {
       median = ((hWScores.begin() + hWScores.size()) / DIVISOR)
     }

    return median;
}

Thanks!!

Bill the Lizard
  • 369,957
  • 201
  • 546
  • 842
Alex
  • 58,815
  • 45
  • 146
  • 176
  • 8
    I'm not sure the use of a named constant for "2" is appropriate here. – Anon. Jan 22 '10 at 03:48
  • @Max - Thanks for the catch, I tagged it. – Alex Jan 22 '10 at 03:50
  • For max happy, I reformmatted your code. I also fixed a few paren issues. – Paul Nathan Jan 22 '10 at 03:52
  • 2
    You'll probably get a many-lines-long error message ultimately referring to the "sort" line. That's because the input parameter to your function is `const` and `sort` is trying to modify its contents. Change that by passing `hWScores` by value instead of by const reference. – Rob Kennedy Jan 22 '10 at 07:21
  • 1
    tell your teacher about partial_sort as it can be used to find the median in O(n) time. no need for any of these fancy odd/even length checks people have been suggesting. –  Jan 22 '10 at 09:58
  • 1
    Darid, using partial_sort will still run in O(n log n) time, you'll still need to figure out which iterator to use for the middle, and you'll still need to average the two middle values if the length is even. – Rob Kennedy Jan 23 '10 at 15:53

6 Answers6

66

There is no need to completely sort the vector: std::nth_element can do enough work to put the median in the correct position. See my answer to this question for an example.

Of course, that doesn't help if your teacher forbids using the right tool for the job.

Community
  • 1
  • 1
Mike Seymour
  • 235,407
  • 25
  • 414
  • 617
  • 8
    In fact the `nth_element` approach should be used instead of sorting since the former only takes O(n) time while the latter takes O(n log n). – kennytm Jan 22 '10 at 12:00
  • 2
    As discussed your solution works only if "size" is not even. – Anonymous Sep 01 '15 at 12:23
  • 2
    @Anonymous, `nth_element` still works for even size. You just need to call `nth_element` twice, to put both 'central' elements into place. – Aaron McDaid Apr 25 '16 at 11:25
  • @AaronMcDaid does that really work though? The second call to `nth_element` could change the position of the value determined by the first call. The array would never have both elements in their proper place at the same time; you'd need to save the first value before doing the second call. – Mark Ransom Jul 31 '17 at 18:22
  • @MarkRansom, as you said, the value from the first call can be stored. After the second call to the `nth_element` function, you can store the second value and calculate the median. You don't need these values to be in the right place in the vector. – Marek Wawrzos Oct 16 '19 at 07:29
  • @MarkRansom It obviously will work, even if the function reorders the array, because the concept of 'median'/"nth_element' indeed does not depend on elements' order. The array represents just a set here. – Jaromír Adamec May 25 '20 at 12:33
  • @JaromírAdamec if you look at the code in the question you'll notice it isn't using the classic definition of "median", it's averaging the *two* points in the middle. You need to use an algorithm that can deliver both of those points. – Mark Ransom May 25 '20 at 16:00
  • @MarkRansom That's ok. Median is sometimes defined this way. I thought that you were worried that one cannot call the function `nth_element` twice on the same array. You can, provided you copy the result of the first call out of the array, and you compute the average between this an the resulting elment of the second call. Or perhaps I just haven't got your point. – Jaromír Adamec May 26 '20 at 18:42
  • @JaromírAdamec I was just concerned that the answer itself left out that important detail. Also concerned that Aaron McDaid made a statement that wasn't quite true, although I'm still unsure about that - does `nth_element` keep the existing order of elements that aren't nth? – Mark Ransom May 26 '20 at 18:51
32

You are doing an extra division and overall making it a bit more complex than it needs to be. Also, there's no need to create a DIVISOR when 2 is actually more meaningful in context.

double CalcMHWScore(vector<int> scores)
{
  size_t size = scores.size();

  if (size == 0)
  {
    return 0;  // Undefined, really.
  }
  else
  {
    sort(scores.begin(), scores.end());
    if (size % 2 == 0)
    {
      return (scores[size / 2 - 1] + scores[size / 2]) / 2;
    }
    else 
    {
      return scores[size / 2];
    }
  }
}
Dash
  • 15,988
  • 6
  • 44
  • 48
Max Shawabkeh
  • 34,985
  • 8
  • 79
  • 89
4
const int DIVISOR = 2;

Don't do this. It just makes your code more convoluted. You've probably read guidelines about not using magic numbers, but evenness vs. oddness of numbers is a fundamental property, so abstracting this out provides no benefit but hampers readability.

if ((hWScores.size() % DIVISOR) == 0)
{
    median = ((hWScores.begin() + hWScores.size()) + (hWScores.begin() + (hWScores.size() + 1))) / DIVISOR);

You're taking an iterator to the end of the vector, taking another iterator that points one past the end of the vector, adding the iterators together (which isn't an operation that makes sense), and then dividing the resulting iterator (which also doesn't make sense). This is the more complicated case; I'll explain what to do for the odd-sized vector first and leave the even-sized case as an exercise for you.

}
else 
{
    median = ((hWScores.begin() + hWScores.size()) / DIVISOR)

Again, you're dividing an iterator. What you instead want to do is to increment an iterator to the beginning of the vector by hWScores.size() / 2 elements:

    median = *(hWScores.begin() + hWScores.size() / 2);

And note that you have to dereference iterators to get values out of them. It'd be more straightforward if you used indices:

    median = hWScores[hWScores.size() / 2];
jamesdlin
  • 48,496
  • 10
  • 105
  • 134
4

I give below a sample program that is somewhat similar to the one in Max S.'s response. To help the OP advance his knowledge and understanding, I have made a number of changes. I have:

a) changed the call by const reference to call by value, since sort is going to want to change the order of the elements in your vector, (EDIT: I just saw that Rob Kennedy also said this while I was preparing my post)

b) replaced size_t with the more appropriate vector<int>::size_type (actually, a convenient synonym of the latter),

c) saved size/2 to an intermediate variable,

d) thrown an exception if the vector is empty, and

e) I have also introduced the conditional operator (? :).

Actually, all of these corrections are straight out of Chapter 4 of "Accelerated C++" by Koenig and Moo.

double median(vector<int> vec)
{
        typedef vector<int>::size_type vec_sz;

        vec_sz size = vec.size();
        if (size == 0)
                throw domain_error("median of an empty vector");

        sort(vec.begin(), vec.end());

        vec_sz mid = size/2;

        return size % 2 == 0 ? (vec[mid] + vec[mid-1]) / 2 : vec[mid];
}
Alexandros Gezerlis
  • 2,191
  • 15
  • 20
3

The accepted answer uses std::sort which does more work than we need it to. The answers that use std::nth_element don't handle the even size case correctly.


We can do a little better than just using std::sort. We don't need to sort the vector completely in order to find the median. We can use std::nth_element to find the middle element. Since the median of a vector with an even number of elements is the average of the middle two, we need to do a little more work to find the other middle element in that case. std::nth_element ensures that all elements preceding the middle are less than the middle. It doesn't guarantee their order beyond that so we need to use std::max_element to find the largest element preceding the middle element.

int CalcMHWScore(std::vector<int> hWScores) {
  assert(!hWScores.empty());
  const auto middleItr = hWScores.begin() + hWScores.size() / 2;
  std::nth_element(hWScores.begin(), middleItr, hWScores.end());
  if (hWScores.size() % 2 == 0) {
    const auto leftMiddleItr = std::max_element(hWScores.begin(), middleItr);
    return (*leftMiddleItr + *middleItr) / 2;
  } else {
    return *middleItr;
  }
}

You might want to consider returning a double because the median may be a fraction when the vector has an even size.

Indiana Kernick
  • 4,485
  • 2
  • 19
  • 40
0

I'm not exactly sure what your restrictions on the user of member functions of vector are, but index access with [] or at() would make accessing elements simpler:

median = hWScores.at(hWScores.size() / 2);

You can also work with iterators like begin() + offset like you are currently doing, but then you need to first calculate the correct offset with size()/2 and add that to begin(), not the other way around. Also you need to dereference the resulting iterator to access the actual value at that point:

median = *(hWScores.begin() + hWScores.size()/2)
sth
  • 200,334
  • 49
  • 262
  • 354