0

I have got a list of double arrays as such : List<Double[]> ys

They all contain y-values from a xy-plot. I want to calculate the population standard deviation for all points of x, which in essence is for every element in each array. Example:

Take the first element of every array, calculate population standard deviation, put value in new array. Move to next element in all arrays in list and calculate population standard deviation and put in the newly created array. etc etc until we have reached the end of all the arrays.

Is there anyway I can achieve this quickly without nested for loops using linq or similar?

Example input ys = {[1, 2, 3, 4, 5], [10, 20, 30, 40, 50], [100, 200, 300, 400, 500]}

output: double[] = [44.69899328, 89.39798655, 134.0969798, 178.7959731, 223.4949664]

44.69899328 comes from: 1, 10, 100

89.39798655 comes from : 2, 20, 200

134.0969798 comes from: 3, 30, 300

178.7959731 comes from: 4, 40, 400

223.4949664 comes from: 5, 50, 500

Breakwin
  • 70
  • 5
  • 1
    Could you, please, provide an example? What's the desired output for, say, `{[1, 2, 3], [15, 20, 40]}`? – Dmitry Bychenko May 04 '21 at 09:48
  • output of your example would be [7, 9, 18.5] – Breakwin May 04 '21 at 09:51
  • @Fildor 7 is the standard deviation of samples :1, 15. 9 is the standard deviation of 2,15. 18.5 is the standard deviation of 3,40 – Breakwin May 04 '21 at 10:03
  • @Fildor probably because you are using sample standard deviation. If you use population standard deviation, those are the numbers you get. Plug numbers into a standard deviation calculator of your choice :) I am using this: https://www.calculator.net/standard-deviation-calculator.html – Breakwin May 04 '21 at 10:07
  • Oooooh, did you use Dmitry's example ?? Yeah, sorry, forget it. My bad. Missed you were actually the OP and answering Dmitry ... haha – Fildor May 04 '21 at 10:10
  • @Fildor yes i did in this instance. No problem – Breakwin May 04 '21 at 10:12

3 Answers3

1

Try following :

        static void Main(string[] args)
        {
            List<Double[]> ys = new List<double[]>() { new double[] { 1, 2, 3, 4, 5 }, new double[] { 10, 20, 30, 40, 50 }, new double[] { 100, 200, 300, 400, 500 } };

            double[] results = ys.SelectMany(x => x.Select((y,i) => new {y = y, i = i})).GroupBy(x => x.i).Select(x => StandardDeviation(x.Select(y => y.y).ToArray())).ToArray(); 

            
        }
         static double StandardDeviation(double[] input)
         {
             double average = input.Average();
             double sumOfSquares = input.Select(x => (average - x) * (average - x)).Sum();

             return Math.Sqrt(sumOfSquares / input.Length);
         }
jdweng
  • 28,546
  • 2
  • 13
  • 18
  • I believe you have taken the standard deviation of every array. Not quite what I needed. As I need the standard deviation of every elements in each array. Please see my example in the original post. I have revised the example to better reflect my question – Breakwin May 04 '21 at 10:42
  • I updated the code just before your comment. – jdweng May 04 '21 at 10:44
  • I updated code slightly to make a little simpler. – jdweng May 04 '21 at 10:57
1

For data where all sub arrays have the same length this could be:

var stdDevs = Enumerable.Range(0, ys[0].Length)
    .Select(i => ys.Select(y => y[i]))
    .Select(StdDev); 

The last part can be .Select(Z => new { Z, V = StdDev(Z) }); if you want the input values.

Test:

var ys = new[] { new[] { 1, 2, 3, 4, 5 }, new[] { 10, 20, 30, 40, 50 }, new[] { 100, 200, 300, 400, 500 } };

var stdDevs = Enumerable.Range(0, ys[0].Length)
    .Select(i => ys.Select(y => y[i]))
    .Select(Z => new { Z, V = StdDev(Z) });

foreach(var d in stdDevs)
{
    Console.WriteLine($"Std dev for {string.Join(",", d.Z)} is {d.V}");
}

static double StdDev(IEnumerable<int> values)
{
    // From https://stackoverflow.com/questions/3141692/standard-deviation-of-generic-list
    // by Jonathan DeMarks   
    double avg = values.Average();
    return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
}

Output:

Std dev for 1,10,100 is 44.69899327725402
Std dev for 2,20,200 is 89.39798655450804
Std dev for 3,30,300 is 134.09697983176207
Std dev for 4,40,400 is 178.79597310901607
Std dev for 5,50,500 is 223.4949663862701

Different lengths

If lengths of sub arrays are different then the version is not as pretty but still readable

var stdDevs = Enumerable.Range(0, ys.Max( y => y.Length))
    .Select(i => ys.Where( y => i < y.Length).Select(y => y[i]))
    .Select(Z => new { Z, V = StdDev(Z) }); 

If this is run with the 5 & 500 removed the result is:

Std dev for 1,10,100 is 44.69899327725402
Std dev for 2,20,200 is 89.39798655450804
Std dev for 3,30,300 is 134.09697983176207
Std dev for 4,40,400 is 178.79597310901607
Std dev for 50 is 0
tymtam
  • 20,472
  • 3
  • 58
  • 92
0

I would start by defining an extension method which can pivot your data

public static class Extensions
{
    public static IEnumerable<T[]> Pivot<T>(this List<T[]> items)
    {
        return items.SelectMany( arr => arr.Select( (x,i) => new{Value=x,Index = i}) )
                    .GroupBy(x => x.Index)
                    .Select(g => g.Select(x => x.Value).ToArray());
    }
}

Then the code, along with a simple implementation of StDev becomes as simple as:

var res = ys.Pivot().Select(StDev);

StDev function:

public static double StDev(double[] input)
{
    double avg = input.Average();
    double sum = input.Select(x => (avg - x) * (avg - x)).Sum();

    return Math.Sqrt(sum / input.Length);
}

Live example: https://dotnetfiddle.net/g3HqRF

Jamiec
  • 118,012
  • 12
  • 125
  • 175