46

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..

Bill the Lizard
  • 369,957
  • 201
  • 546
  • 842
dead and bloated
  • 463
  • 1
  • 5
  • 5
  • 3
    This seems to be missing the homework tag.... – overslacked May 22 '09 at 00:39
  • 12
    please please please please don't assume the OP is asking a question for homework purposes, rather than for a "real" project or for self-improvement. Ask them. – Jason S May 22 '09 at 13:00
  • 3
    i actually am not asking for homework reasons, but if it helps people who are doing homework to find the answer, then please add the tag – dead and bloated May 24 '09 at 05:39
  • 1
    @overslacked The homework tag is being phased out and must not be used anymore (as I just learned myself) - http://meta.stackexchange.com/q/147100 – vzwick Sep 15 '12 at 23:15

12 Answers12

104

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-2));
}

If you have the whole population (as opposed to a sample population), then use return Math.Sqrt(S / (k-1));.

EDIT: I've updated the code according to Jason's remarks...

EDIT: I've also updated the code according to Alex's remarks...

Jaime
  • 59,107
  • 15
  • 108
  • 149
  • 2
    +1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works. – Jason S May 22 '09 at 12:36
  • 2
    OH: and you're forgetting the divide-by-N or divide-by-N-1 at the end. – Jason S May 22 '09 at 12:57
  • 7
    Now look what you've done. You've given me something new to learn. You beast. – dmckee --- ex-moderator kitten May 22 '09 at 16:30
  • 6
    John Cook has a good article on standard deviation for large values at http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/, and a followup describing the reasons behind the algorithms at http://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/. – Emperor XLII Jun 07 '09 at 17:37
  • It actually is John's blog that I'm linking for a description of Welford's method... – Jaime Jun 08 '09 at 15:39
  • 3
    Actually, if you have the whole list beforehand, a corrected 2-pass algorithm will do fine (cf. eg. Numerical Recipes). This method is when you have a *stream* of values that you don't want to store. – Alexandre C. Apr 05 '11 at 10:31
  • 2
    Actually, your variable k ends up being 1 larger than the actual number of points since it starts at 1 (e.g. with 2 points, k = 3). This means that you need to subtract that additional 1 in the last step, so k-1 for the whole population and k-2 for the sample population. – alex.forencich Jun 26 '14 at 04:33
  • 2
    Alternatively you can start k at zero, but increment it first instead of last. Then set it back to k-1 at the end. – alex.forencich Jun 26 '14 at 05:52
  • This answer was fantastically helpful. My statistics knowledge is very rusty and most of the Welford method guides are pretty math heavy - this gave me what I needed in Python in 5 minutes. Thanks a million. – EnemyBagJones May 11 '16 at 16:12
  • @Jaime, I think sample should be a bool parameter and the last line should be: `return S == 0 ? 0 : Math.Sqrt(S / (k - (sample ? 2 : 1)));` to handle cases of 1 value such as `[15.8569444444444]` based on other answers such as:https://stackoverflow.com/questions/2253874/standard-deviation-in-linq/2878000#2878000 I would also say: `return Math.Sqrt(S / (k - (sample ? 1 : 0)));` would work – Jay May 29 '20 at 03:06
8

10 times faster solution than Jaime's, but be aware that, as Jaime pointed out:

"While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance"

If you think you are dealing with very large numbers or a very large quantity of numbers, you should calculate using both methods, if the results are equal, you know for sure that you can use "my" method for your case.

    public static double StandardDeviation(double[] data)
    {
        double stdDev = 0;
        double sumAll = 0;
        double sumAllQ = 0;

        //Sum of x and sum of x²
        for (int i = 0; i < data.Length; i++)
        {
            double x = data[i];
            sumAll += x;
            sumAllQ += x * x;
        }

        //Mean (not used here)
        //double mean = 0;
        //mean = sumAll / (double)data.Length;

        //Standard deviation
        stdDev = System.Math.Sqrt(
            (sumAllQ -
            (sumAll * sumAll) / data.Length) *
            (1.0d / (data.Length - 1))
            );

        return stdDev;
    }
Pedro77
  • 4,773
  • 7
  • 50
  • 82
5

The accepted answer by Jaime is great, except you need to divide by k-2 in the last line (you need to divide by "number_of_elements-1"). Better yet, start k at 0:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    int k = 0;
    foreach (double value in valueList) 
    {
        k++;
        double tmpM = M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
    }
    return Math.Sqrt(S / (k-1));
}
AlexB
  • 71
  • 1
  • 2
  • with the running formula, you better use decimal instead of double to mitigate rounding and truncation errors and convert the result back to double –  Jul 17 '19 at 19:25
5

The Math.NET library provides this for you to of the box.

PM> Install-Package MathNet.Numerics

var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();

var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();

See PopulationStandardDeviation for more information.

Chris Marisic
  • 30,638
  • 21
  • 158
  • 255
  • link broken, use https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Statistics.htm#PopulationStandardDeviation – alv Jun 30 '19 at 07:12
2

You can avoid making two passes over the data by accumulating the mean and mean-square

cnt = 0
mean = 0
meansqr = 0
loop over array
    cnt++
    mean += value
    meansqr += value*value
mean /= cnt
meansqr /= cnt

and forming

sigma = sqrt(meansqr - mean^2)

A factor of cnt/(cnt-1) is often appropriate as well.

BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.

Community
  • 1
  • 1
dmckee --- ex-moderator kitten
  • 90,531
  • 23
  • 129
  • 225
  • 1
    Your formula is wrong. It should be sigma = sqrt( meansqr - mean^2 ) Read this page http://en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake. – leif May 22 '09 at 00:50
  • @leif: Yep. And I should have noticed the dimensional problem, too. – dmckee --- ex-moderator kitten May 22 '09 at 01:28
  • 1
    @Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway. – dmckee --- ex-moderator kitten May 22 '09 at 16:23
2

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
Demi
  • 5,941
  • 7
  • 33
  • 38
  • 1
    Dividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear. –  May 22 '09 at 01:55
  • That is correct - this is for sample variance. I appreciate the highlight. – Demi May 22 '09 at 01:58
  • Your code crashes for the legitimate case of a list with one value. – SPWorley May 22 '09 at 02:39
  • 1
    for a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless. – Demi May 22 '09 at 03:23
1

With Extension methods.

using System;
using System.Collections.Generic;

namespace SampleApp
{
    internal class Program
    {
        private static void Main()
        {
            List<double> data = new List<double> {1, 2, 3, 4, 5, 6};

            double mean = data.Mean();
            double variance = data.Variance();
            double sd = data.StandardDeviation();

            Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd);
            Console.WriteLine("Press any key to continue...");
            Console.ReadKey();
        }
    }

    public static class MyListExtensions
    {
        public static double Mean(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.Mean(0, values.Count);
        }

        public static double Mean(this List<double> values, int start, int end)
        {
            double s = 0;

            for (int i = start; i < end; i++)
            {
                s += values[i];
            }

            return s / (end - start);
        }

        public static double Variance(this List<double> values)
        {
            return values.Variance(values.Mean(), 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean)
        {
            return values.Variance(mean, 0, values.Count);
        }

        public static double Variance(this List<double> values, double mean, int start, int end)
        {
            double variance = 0;

            for (int i = start; i < end; i++)
            {
                variance += Math.Pow((values[i] - mean), 2);
            }

            int n = end - start;
            if (start > 0) n -= 1;

            return variance / (n);
        }

        public static double StandardDeviation(this List<double> values)
        {
            return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count);
        }

        public static double StandardDeviation(this List<double> values, int start, int end)
        {
            double mean = values.Mean(start, end);
            double variance = values.Variance(mean, start, end);

            return Math.Sqrt(variance);
        }
    }
}
Rikin Patel
  • 7,983
  • 7
  • 66
  • 73
1

I found that Rob's helpful answer didn't quite match what I was seeing using excel. To match excel, I passed the Average for valueList in to the StandardDeviation calculation.

Here is my two cents... and clearly you could calculate the moving average (ma) from valueList inside the function - but I happen to have already before needing the standardDeviation.

public double StandardDeviation(List<double> valueList, double ma)
{
   double xMinusMovAvg = 0.0;
   double Sigma = 0.0;
   int k = valueList.Count;


  foreach (double value in valueList){
     xMinusMovAvg = value - ma;
     Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg);
  }
  return Math.Sqrt(Sigma / (k - 1));
}       
0

The trouble with all the other answers is that they assume you have your data in a big array. If your data is coming in on the fly, this would be a better approach. This class works regardless of how or if you store your data. It also gives you the choice of the Waldorf method or the sum-of-squares method. Both methods work using a single pass.

public final class StatMeasure {
  private StatMeasure() {}

  public interface Stats1D {

    /** Add a value to the population */
    void addValue(double value);

    /** Get the mean of all the added values */
    double getMean();

    /** Get the standard deviation from a sample of the population. */
    double getStDevSample();

    /** Gets the standard deviation for the entire population. */
    double getStDevPopulation();
  }

  private static class WaldorfPopulation implements Stats1D {
    private double mean = 0.0;
    private double sSum = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      double tmpMean = mean;
      double delta = value - tmpMean;
      mean += delta / ++count;
      sSum += delta * (value - mean);
    }

    @Override
    public double getMean() { return mean; }

    @Override
    public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); }

    @Override
    public double getStDevPopulation() { return Math.sqrt(sSum / (count)); }
  }

  private static class StandardPopulation implements Stats1D {
    private double sum = 0.0;
    private double sumOfSquares = 0.0;
    private int count = 0;

    @Override
    public void addValue(double value) {
      sum += value;
      sumOfSquares += value * value;
      count++;
    }

    @Override
    public double getMean() { return sum / count; }

    @Override
    public double getStDevSample() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1));
    }

    @Override
    public double getStDevPopulation() {
      return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count);
    }
  }

  /**
   * Returns a way to measure a population of data using Waldorf's method.
   * This method is better if your population or values are so large that
   * the sum of x-squared may overflow. It's also probably faster if you
   * need to recalculate the mean and standard deviation continuously,
   * for example, if you are continually updating a graphic of the data as
   * it flows in.
   *
   * @return A Stats1D object that uses Waldorf's method.
   */
  public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); }

  /**
   * Return a way to measure the population of data using the sum-of-squares
   * method. This is probably faster than Waldorf's method, but runs the
   * risk of data overflow.
   *
   * @return A Stats1D object that uses the sum-of-squares method
   */
  public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); }
}
MiguelMunoz
  • 3,787
  • 1
  • 25
  • 31
0

We may be able to use statistics module in Python. It has stedev() and pstdev() commands to calculate standard deviation of sample and population respectively.

details here: https://www.geeksforgeeks.org/python-statistics-stdev/

import statistics as st print(st.ptdev(dataframe['column name']))

Dhaval
  • 1
  • 1
0

This is Population standard deviation

private double calculateStdDev(List<double> values)
{
    double average = values.Average();
    return Math.Sqrt((values.Select(val => (val - average) * (val - average)).Sum()) / values.Count);
}

For Sample standard deviation, just change [values.Count] to [values.Count -1] in above code.

Make sure you don't have only 1 data point in your set.

0
/// <summary>
/// Calculates standard deviation, same as MATLAB std(X,0) function
/// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/>
/// </summary>
/// <param name="values">enumumerable data</param>
/// <returns>Standard deviation</returns>
public static double GetStandardDeviation(this IEnumerable<double> values)
{
    //validation
    if (values == null)
        throw new ArgumentNullException();

    int lenght = values.Count();

    //saves from devision by 0
    if (lenght == 0 || lenght == 1)
        return 0;

    double sum = 0.0, sum2 = 0.0;

    for (int i = 0; i < lenght; i++)
    {
        double item = values.ElementAt(i);
        sum += item;
        sum2 += item * item;
    }

    return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1));
}
oleksii
  • 33,544
  • 11
  • 83
  • 149