If I do a standard deviation calculation for a sample using this code modified somewhat from this SO question:
public double CalculateStandardDeviation(List<double> values, bool sample = false)
{
double mean = 0.0;
double sum = 0.0;
double stdDev = 0.0;
int count = 0;
foreach (double val in values)
{
count++;
double delta = val - mean;
mean += delta / count;
sum += delta * (val - mean);
}
if (1 < count)
stdDev = Math.Sqrt(sum / (count - (sample ? 1 : 0)));
return stdDev;
}
Using this unit test:
[Test]
public void Sample_Standard_Deviation_Returns_Expected_Value()
{
//original cite: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/
double expected = 2.23606797749979;
double tolerance = 1.0 / System.Math.Pow(10, 13);
var cm = new CommonMath();//a library of math functions we use a lot
List<double> values = new List<double> { 4.0, 2.0, 5.0, 8.0, 6.0 };
double actual = cm.CalculateStandardDeviation(values, true);
Assert.That(actual, Is.EqualTo(expected).Within(tolerance));
}
The test passes with a resultant value within the specified tolerance.
However, if I use this Linq-ified code, it fails, returning a value of 2.5 (as if it were a population standard deviation instead):
double meanOfValues = values.Average();
double sumOfValues = values.Sum();
int countOfValues = values.Count;
double standardDeviationOfValues =
Math.Sqrt(sumOfValues / (countOfValues - (sample ? 1 : 0)));
return standardDeviationOfValues;
As I've never taken statistics (so please be gentle), the Linq-ification (that's a word) of the values from the list seem like they should give me the same results, but they don't and I don't understand what I've done wrong. The action of deciding between N & N-1 is the same in both, so why isn't the answer the same?