61

I need to write function that will accept array of decimals and it will find the median.

Is there a function in the .net Math library?

abatishchev
  • 92,232
  • 78
  • 284
  • 421
WingMan20-10
  • 2,894
  • 8
  • 28
  • 41

12 Answers12

72

Looks like other answers are using sorting. That's not optimal from performance point of view because it takes O(n logn) time. It is possible to calculate median in O(n) time instead. The generalized version of this problem is known as "n-order statistics" which means finding an element K in a set such that we have n elements smaller or equal to K and rest are larger or equal K. So 0th order statistic would be minimal element in the set (Note: Some literature use index from 1 to N instead of 0 to N-1). Median is simply (Count-1)/2-order statistic.

Below is the code adopted from Introduction to Algorithms by Cormen et al, 3rd Edition.

/// <summary>
/// Partitions the given list around a pivot element such that all elements on left of pivot are <= pivot
/// and the ones at thr right are > pivot. This method can be used for sorting, N-order statistics such as
/// as median finding algorithms.
/// Pivot is selected ranodmly if random number generator is supplied else its selected as last element in the list.
/// Reference: Introduction to Algorithms 3rd Edition, Corman et al, pp 171
/// </summary>
private static int Partition<T>(this IList<T> list, int start, int end, Random rnd = null) where T : IComparable<T>
{
    if (rnd != null)
        list.Swap(end, rnd.Next(start, end+1));

    var pivot = list[end];
    var lastLow = start - 1;
    for (var i = start; i < end; i++)
    {
        if (list[i].CompareTo(pivot) <= 0)
            list.Swap(i, ++lastLow);
    }
    list.Swap(end, ++lastLow);
    return lastLow;
}

/// <summary>
/// Returns Nth smallest element from the list. Here n starts from 0 so that n=0 returns minimum, n=1 returns 2nd smallest element etc.
/// Note: specified list would be mutated in the process.
/// Reference: Introduction to Algorithms 3rd Edition, Corman et al, pp 216
/// </summary>
public static T NthOrderStatistic<T>(this IList<T> list, int n, Random rnd = null) where T : IComparable<T>
{
    return NthOrderStatistic(list, n, 0, list.Count - 1, rnd);
}
private static T NthOrderStatistic<T>(this IList<T> list, int n, int start, int end, Random rnd) where T : IComparable<T>
{
    while (true)
    {
        var pivotIndex = list.Partition(start, end, rnd);
        if (pivotIndex == n)
            return list[pivotIndex];

        if (n < pivotIndex)
            end = pivotIndex - 1;
        else
            start = pivotIndex + 1;
    }
}

public static void Swap<T>(this IList<T> list, int i, int j)
{
    if (i==j)   //This check is not required but Partition function may make many calls so its for perf reason
        return;
    var temp = list[i];
    list[i] = list[j];
    list[j] = temp;
}

/// <summary>
/// Note: specified list would be mutated in the process.
/// </summary>
public static T Median<T>(this IList<T> list) where T : IComparable<T>
{
    return list.NthOrderStatistic((list.Count - 1)/2);
}

public static double Median<T>(this IEnumerable<T> sequence, Func<T, double> getValue)
{
    var list = sequence.Select(getValue).ToList();
    var mid = (list.Count - 1) / 2;
    return list.NthOrderStatistic(mid);
}

Few notes:

  1. This code replaces tail recursive code from the original version in book in to iterative loop.
  2. It also eliminates unnecessary extra check from original version when start==end.
  3. I've provided two version of Median, one that accepts IEnumerable and then creates a list. If you use the version that accepts IList then keep in mind it modifies the order in list.
  4. Above methods calculates median or any i-order statistics in O(n) expected time. If you want O(n) worse case time then there is technique to use median-of-median. While this would improve worse case performance, it degrades average case because constant in O(n) is now larger. However if you would be calculating median mostly on very large data then its worth to look at.
  5. The NthOrderStatistics method allows to pass random number generator which would be then used to choose random pivot during partition. This is generally not necessary unless you know your data has certain patterns so that last element won't be random enough or if somehow your code is exposed outside for targeted exploitation.
  6. Definition of median is clear if you have odd number of elements. It's just the element with index (Count-1)/2 in sorted array. But when you even number of element (Count-1)/2 is not an integer anymore and you have two medians: Lower median Math.Floor((Count-1)/2) and Math.Ceiling((Count-1)/2). Some textbooks use lower median as "standard" while others propose to use average of two. This question becomes particularly critical for set of 2 elements. Above code returns lower median. If you wanted instead average of lower and upper then you need to call above code twice. In that case make sure to measure performance for your data to decide if you should use above code VS just straight sorting.
  7. For .net 4.5+ you can add MethodImplOptions.AggressiveInlining attribute on Swap<T> method for slightly improved performance.
Wai Ha Lee
  • 7,664
  • 52
  • 54
  • 80
Shital Shah
  • 47,549
  • 10
  • 193
  • 157
  • @ShitalShah: re: 6, if I want to calculate the median with the average, instead of making 2 calls to NthOrderStatistic, can't I take advantage of the fact that the list is mutated and basically select the next item. I am not sure if the NthOrderStatistic method ends up sorting the list ascending or only a portion of it (depending on the data in the list ultimately). – costa Jun 22 '15 at 22:19
  • 1
    @costa - NthOrderStatistics does not have any guerentee on any half being sorted. The next item is also not guerentee dot be next smaller or bigger item. – Shital Shah Jun 23 '15 at 10:15
  • 1
    This came in very handy, thanks! I updated the methods to use C# 6 expression-bodied members and stuck in a gist, along with a standard deviation algorithm - https://gist.github.com/cchamberlain/478bf7a3411beb47abb6 – cchamberlain Aug 08 '15 at 07:59
  • 3
    I found two problems with the algorithm. First, replace `rnd.Next(start, end)` with `rnd.Next(start, end + 1)` to not preclude `end` from being a pivot. Second, if array contains many identical values, algorithm will become `O(n^2)`. To avoid that, add a check in `Partition()` to return `end` if `pivot` is same as `list[prevPivotIndex]`. – G. Cohen Nov 03 '16 at 17:49
  • @G. Cohen - Good catch for `rnd.Next(start, end+1)`. However I'm not sure about the returning end if pivot is same as last. I'll need to think about this one... – Shital Shah Nov 03 '16 at 18:21
  • You should use `<` instead of ` – Logerfo Dec 13 '17 at 18:24
  • I tried using the Median method, and it was dog slow. When I wrote another Median method that takes in a randon number generator, that ran fast. Something is wrong with the pivoting without the RNG I think. – Paul Chernoch Mar 26 '18 at 22:24
  • @costa The next item is not guaranteed to be sorted, but every element on the right-hand side of the pivot is guaranteed to be larger than or equal to the pivot. So the minimum element on the right hand side is the next sorted value. Finding the minimum of a half of a list is certainly faster than running the entire selection algorithm again. – relatively_random Sep 07 '20 at 09:41
43

Thanks Rafe, this takes into account the issues your replyers posted.

public static double GetMedian(double[] sourceNumbers) {
        //Framework 2.0 version of this method. there is an easier way in F4        
        if (sourceNumbers == null || sourceNumbers.Length == 0)
            throw new System.Exception("Median of empty array not defined.");

        //make sure the list is sorted, but use a new array
        double[] sortedPNumbers = (double[])sourceNumbers.Clone();
        Array.Sort(sortedPNumbers);

        //get the median
        int size = sortedPNumbers.Length;
        int mid = size / 2;
        double median = (size % 2 != 0) ? (double)sortedPNumbers[mid] : ((double)sortedPNumbers[mid] + (double)sortedPNumbers[mid - 1]) / 2;
        return median;
    }
Jason Jakob
  • 439
  • 4
  • 4
  • Why is the function a static here? – richieqianle Dec 23 '14 at 02:47
  • 2
    @richieqianle: Because everything what can be static should be static. It's more efficient from the perspective of [virtual functions table](http://stackoverflow.com/questions/2413483/virtual-method-tables). – abatishchev Feb 08 '15 at 05:51
  • @abatishchev A method isn´t virtual by default on C# (in contrast to Java). But even if it *were*, performance is a really bad cause for making something static or not. A better reason - at least in this answer - might be if the method is some kinf od a utility-method, where no instance of the class is needed. – HimBromBeere Nov 01 '17 at 16:46
  • @HimBromBeere: "where no instance of the class is needed" is basically equal to "everything what can be static should be static" – abatishchev Nov 01 '17 at 18:06
  • 1
    @abatishchev I agree, static is perfectly ok for this. – DavidGuaita Dec 12 '17 at 06:10
29

Math.NET is an opensource library that offers a method for calculating the Median. The nuget package is called MathNet.Numerics.

The usage is pretty simple:

using MathNet.Numerics.Statistics;

IEnumerable<double> data;
double median = data.Median();
NePh
  • 727
  • 7
  • 21
25
decimal Median(decimal[] xs) {
  Array.Sort(xs);
  return xs[xs.Length / 2];
}

Should do the trick.

-- EDIT --

For those who want the full monty, here is the complete, short, pure solution (a non-empty input array is assumed):

decimal Median(decimal[] xs) {
  var ys = xs.OrderBy(x => x).ToList();
  double mid = (ys.Count - 1) / 2.0;
  return (ys[(int)(mid)] + ys[(int)(mid + 0.5)]) / 2;
}
Rafe
  • 5,047
  • 3
  • 20
  • 26
  • 10
    This is `O(n log n)`. It's possible to find the median in `O(n)` time. Also, this fails to the return the median in case the array is of even length (it should be the average of the two middle elements after the array is sorted). – jason Nov 10 '10 at 02:43
  • 5
    Sure, but the question didn't mention O(n) as a requirement and, regarding the even or empty cases, they were left as an exercise for the student. – Rafe Nov 10 '10 at 22:03
  • 6
    Also this modifies the array you pass to the method, which is just silly. – Gleno Oct 20 '11 at 04:22
  • 5
    @Gleno, I rather think the spec. leaves all this open (well, I was interpreting 'function' in the C# sense, which can have side effects). The goal was simply to demonstrate a short answer. – Rafe Oct 20 '11 at 14:19
24

Is there a function in the .net Math library?

No.

It's not hard to write your own though. The naive algorithm sorts the array and picks the middle (or the average of the two middle) elements. However, this algorithm is O(n log n) while its possible to solve this problem in O(n) time. You want to look at selection algorithms to get such an algorithm.

jason
  • 220,745
  • 31
  • 400
  • 507
5

Here's a generic version of Jason's answer

    /// <summary>
    /// Gets the median value from an array
    /// </summary>
    /// <typeparam name="T">The array type</typeparam>
    /// <param name="sourceArray">The source array</param>
    /// <param name="cloneArray">If it doesn't matter if the source array is sorted, you can pass false to improve performance</param>
    /// <returns></returns>
    public static T GetMedian<T>(T[] sourceArray, bool cloneArray = true) where T : IComparable<T>
    {
        //Framework 2.0 version of this method. there is an easier way in F4        
        if (sourceArray == null || sourceArray.Length == 0)
            throw new ArgumentException("Median of empty array not defined.");

        //make sure the list is sorted, but use a new array
        T[] sortedArray = cloneArray ? (T[])sourceArray.Clone() : sourceArray;
        Array.Sort(sortedArray);

        //get the median
        int size = sortedArray.Length;
        int mid = size / 2;
        if (size % 2 != 0)
            return sortedArray[mid];

        dynamic value1 = sortedArray[mid];
        dynamic value2 = sortedArray[mid - 1];
        return (sortedArray[mid] + value2) * 0.5;
    }
Will Calderwood
  • 3,878
  • 2
  • 31
  • 55
1

Here is the fastest unsafe implementation, same algorithm before, taken from this source

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    private static unsafe void SwapElements(int* p, int* q)
    {
        int temp = *p;
        *p = *q;
        *q = temp;
    }

    public static unsafe int Median(int[] arr, int n)
    {
        int middle, ll, hh;

        int low = 0; int high = n - 1; int median = (low + high) / 2;
        fixed (int* arrptr = arr)
        {
            for (;;)
            {
                if (high <= low)
                    return arr[median];

                if (high == low + 1)
                {
                    if (arr[low] > arr[high])
                        SwapElements(arrptr + low, arrptr + high);
                    return arr[median];
                }

                middle = (low + high) / 2;
                if (arr[middle] > arr[high])
                    SwapElements(arrptr + middle, arrptr + high);

                if (arr[low] > arr[high])
                    SwapElements(arrptr + low, arrptr + high);

                if (arr[middle] > arr[low])
                    SwapElements(arrptr + middle, arrptr + low);

                SwapElements(arrptr + middle, arrptr + low + 1);

                ll = low + 1;
                hh = high;
                for (;;)
                {
                    do ll++; while (arr[low] > arr[ll]);
                    do hh--; while (arr[hh] > arr[low]);

                    if (hh < ll)
                        break;

                    SwapElements(arrptr + ll, arrptr + hh);
                }

                SwapElements(arrptr + low, arrptr + hh);

                if (hh <= median)
                    low = ll;
                if (hh >= median)
                    high = hh - 1;
            }
        }
    }
eladm
  • 156
  • 1
  • 5
1

CenterSpace's NMath library provides a function:

double[] values = new double[arraySize];
double median = NMathFunctions.Median(values);

Optionally you can opt to use NaNMedian (if your array may contain null values) but you will need to convert the array to a vector:

double median = NMathFunctions.NaNMedian(new DoubleVector(values));

CenterSpace's NMath Library isn't free, but many universities have licenses

soxfan04
  • 11
  • 3
1

Sometime in the future. This is I think as simple as it can get.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Median
{
    class Program
    {
        static void Main(string[] args)
        {
            var mediaValue = 0.0;
            var items = new[] { 1, 2, 3, 4,5 };
            var getLengthItems = items.Length;
            Array.Sort(items);
            if (getLengthItems % 2 == 0)
            {
                var firstValue = items[(items.Length / 2) - 1];
                var secondValue = items[(items.Length / 2)];
                mediaValue = (firstValue + secondValue) / 2.0;
            }
            if (getLengthItems % 2 == 1)
            {
                mediaValue = items[(items.Length / 2)];
            }
            Console.WriteLine(mediaValue);
            Console.WriteLine("Enter to Exit!");
            Console.ReadKey();
        }
    }
}
Krishneil
  • 1,144
  • 14
  • 23
  • You can actually get by without the if statements. Just set `medianValue = (items[items.Length / 2] + items[(items.Length - 1) / 2])/2`. Thanks to integer division for an odd number of items in your array you'll just get the same item twice and when you add it to itself then divide by two you'll get the same number back. For an even number of items you'll get the two different indexes. You might also consider leaving it as-is for clarity, but this way is better for brevity. – Tom H Dec 18 '20 at 23:21
0

Below code works: but not very efficient way. :(

static void Main(String[] args) {
        int n = Convert.ToInt32(Console.ReadLine());            
        int[] medList = new int[n];

        for (int x = 0; x < n; x++)
            medList[x] = int.Parse(Console.ReadLine());

        //sort the input array:
        //Array.Sort(medList);            
        for (int x = 0; x < n; x++)
        {
            double[] newArr = new double[x + 1];
            for (int y = 0; y <= x; y++)
                newArr[y] = medList[y];

            Array.Sort(newArr);
            int curInd = x + 1;
            if (curInd % 2 == 0) //even
            {
                int mid = (x / 2) <= 0 ? 0 : (newArr.Length / 2);
                if (mid > 1) mid--;
                double median = (newArr[mid] + newArr[mid+1]) / 2;
                Console.WriteLine("{0:F1}", median);
            }
            else //odd
            {
                int mid = (x / 2) <= 0 ? 0 : (newArr.Length / 2);
                double median = newArr[mid];
                Console.WriteLine("{0:F1}", median);
            }
        }

}
0

My 5 cents (because it appears more straightforward/simpler and sufficient for short lists):

public static T Median<T>(this IEnumerable<T> items)
{
    var i = (int)Math.Ceiling((double)(items.Count() - 1) / 2);
    if (i >= 0)
    {
        var values = items.ToList();
        values.Sort();
        return values[i];
    }

    return default(T);
}

P.S. using "higher median" as described by ShitalShah.

mike
  • 1,198
  • 9
  • 27
0

I have an histogram with the variable : group
Here how I calculate my median :

int[] group = new int[nbr]; 

// -- Fill the group with values---

// sum all data in median
int median = 0;
for (int i =0;i<nbr;i++) median += group[i];

// then divide by 2 
median = median / 2;

// find 50% first part 
for (int i = 0; i < nbr; i++)
{
   median -= group[i];
   if (median <= 0)
   {
      median = i;
      break;
   }
}

median is the group index of median

Wai Ha Lee
  • 7,664
  • 52
  • 54
  • 80