2

I wish to compute the median value from an array of floats in C++:

float Median( FloatArray const * constFloatArray )
{
    FloatArray    scratch = FloatArray( *constFloatArray );
    int64_t const size    = scratch.GetWidth() * scratch.GetHeight();
    int64_t const mid     = size / 2;

    std::nth_element( scratch.begin(), scratch.begin() + mid, scratch.end() );

    return scratch[ mid ];
}

The FloatArray contains a regular C++ array of floats.

I'm using std::nth_element but wonder if there is a facility like nth_element that works with const data? Right now, I'm making a copy and then doing nth_element before throwing the copy away. If there isn't something like nth_element for const data, is there a more efficient approach that uses the copy step to compute information and thus avoid a potentially additional O(n) loop? Perhaps the performance impact would be negligible? My array size could be on the order of 2 billion.

WilliamKF
  • 36,283
  • 61
  • 170
  • 271
  • 1
    you might want to look into some other algorithm that doesn't modify the source like: https://stackoverflow.com/questions/10930732/c-efficiently-calculating-a-running-median – kmdreko Sep 06 '19 at 19:47
  • You might be able to adjust the `nth_element` algorithm to also copy the elements (https://stackoverflow.com/questions/29145520/how-is-nth-element-implemented), but I doubt it will be significantly faster. It would be interesting to see the results if you make a performance comparison. – nielsen Sep 06 '19 at 21:19
  • It’s not possible to select the median without altering the input with o(*n*) extra space, so it’s not a natural candidate for a non-modifying algorithm; nor is there an obvious use for the copy. – Davis Herring Sep 06 '19 at 22:38

1 Answers1

3

I'm not sure if it will be more efficient but you can save half of the copying by using std::partial_sort_copy. We can use std::partial_sort_copy to copy only half of the data into a new array and it will sort it into that array as it does so. Then all you need to do is get the last element for an odd number of elements, or average of last two for even amount of elements. Using a vector that would look like

int main() 
{
    std::vector<int> v{5, 6, 4, 3, 2, 6, 7, 9, 3, 10};
    std::vector<int> r(v.size() / 2 + 1);
    std::partial_sort_copy(v.begin(), v.end(), r.begin(), r.end());
    if (r.size() % 2)
        std::cout << r.back();
    else
        std::cout << (r[r.size() - 1] + r[r.size() - 2]) / 2.0;
}
NathanOliver
  • 150,499
  • 26
  • 240
  • 331
  • As I understood it, `nth_element` is surprisingly only O(n), so wouldn't doing a sort would bump it to O(n*log(n))? – WilliamKF Sep 06 '19 at 20:16
  • @WilliamKF It will actualy be `O(Nlog(N/2))` but yes it is more algorithmically complex. It does save half the space though and has half the copies so it could work out to be better, but that is something you need to benchmark on your data set. – NathanOliver Sep 06 '19 at 20:18
  • 1
    @NathanOliver: The /2 makes no difference inside the O. – Davis Herring Sep 06 '19 at 22:39