2

I've need to find the average and standard deviation of a large amount of data in this format. I tried using Excel but there doesn't appear to be an easy way to transpose the columns. What am I missing in Excel or should I just use Perl?

Input file format is:

0 123

0 234

0 456

1 657

1 234

1 543

Want result to group the averages and standard deviations by the values in the first column:

0 AvgOfAllZeros StdDevOfALlZeros

1 AvgOfAllOnes StdDevOfAllOnes

brian d foy
  • 121,466
  • 31
  • 192
  • 551
  • 1
    Excel has TRANSPOSE as a function which converts horizontal to vertical array or vice versa or did you mean somthing other than transpose? – mmmmmm Sep 11 '09 at 20:03
  • 2
    If just loading look at pivot tables and choose avg or others instead of sum in the table – mmmmmm Sep 11 '09 at 20:04

7 Answers7

3

This is easy to do in R. If your data is in a file called foo, then this code will do the trick:

> data <- read.table("foo")
> cbind(avg=with(data, tapply(V2, V1, mean)),
+       stddev=with(data, tapply(V2, V1, sd)))
  avg   stddev
0 271 169.5553
1 478 218.8630
Jonathan Chang
  • 21,599
  • 5
  • 31
  • 32
2

crack's knuckles

using the Statistics::Descriptive CPAN module, you can get it with this:

use strict;
use warnings;
use Statistics::Descriptive;

my ($file) = @ARGV;

my @zeroes;
my @ones;

# Reading it in
open my $fh, '<', $file or die "unable to open '$file', $!";

while (my $line = <$fh>)
{
   chomp $line;
   my ($value, $number) = split("\s+", $line);
   if ($value)
   {
      push @ones, $number;
   }
   else
   {
      push @zeroes, $number;
   }
}
close $fh or warn "Can't close fh! $!";

# Stat processing
$stat_zeroes   = Statistics::Descriptive::Full->new();
$stat_ones     = Statistics::Descriptive::Full->new();

$stat_zeroes->add_data(@zeroes);
$stat_ones->add_data(@ones);

print "0: ", $stat_zeroes->mean(), " ", $stat_zeroes->standard_deviation(), "\n",
      "1: ", $stat_ones->mean(), " ", $stat_zeroes->standard_deviation(), "\n";
Brad Gilbert
  • 32,263
  • 9
  • 73
  • 122
Robert P
  • 15,185
  • 8
  • 66
  • 110
2

If you do this manually in Excel you can copy the data and then Paste it with Paste Special menu option. There is a Transpose check box there.

If you do this more frequently here is a Perl script. Memory complexity is linear to the size of output, so constant in case of only two rows:

#!/usr/bin/perl

while (<>) {
    my ($x, $y) = split;
    $sum{$x} += $y;
    $count{$x}++;
    $sumSq{$x} += $y * $y;
}

for $i (sort keys %sum) {
    $stdev = sqrt(($sumSq{$i} - $sum{$i} * $sum{$i} / $count{$i}) / ($count{$i} - 1));
    print $i, " ", $sum{$i}/$count{$i}, " ", $stdev, "\n";
}
agsamek
  • 7,574
  • 11
  • 34
  • 42
1

Excel can be used. There is an AVERAGEIF function but nothing similar for STDEV, so an alternate two step method is required.

The data can be 'transposed' by adding two columns with formulas to the right. Assuming your data is in column A and B, the formula in column C would be;

=IF(A2=0,B2,"")

In column D it would be;

=IF(A2=1,B2,"")

Then the new columns can have formulas added to the bottom

for Average

=AVERAGE(C2:C7)

And for StdDev

=STDEV(C2:C7)

Excel standard deviation

LondonRob
  • 53,478
  • 30
  • 110
  • 152
Robert Mearns
  • 10,788
  • 2
  • 35
  • 40
0

Have you tried using the AVERAGEIF function of Excel?

PezHead
  • 23
  • 5
0

If you dealing with a large set of data then you should consider PDL... the Perl Data Language.

See this related SO answer.

Community
  • 1
  • 1
draegtun
  • 22,021
  • 5
  • 45
  • 70
0

I would use the SUMIF and COUNTIF formulas. You would have to add an extra column or two to get the squared deviations in order to figure out the standard deviation. One example looks like this

alt text

with the formula in B10 =SUMIF($A$2:$A$7,"="&A10,$B$2:$B$7)/COUNTIF($A$2:$A$7,"="&A10) and in B11 =SQRT(SUMIF($A$2:$A$7,"="&A10,$D$2:$D$7)/COUNTIF($A$2:$A$7,"="&A10))

Glorfindel
  • 19,729
  • 13
  • 67
  • 91
af.
  • 141
  • 2