7

my program manipulates STL vectors of integers but, from time to time, I need to calculate a few statistics on them. Therefore I use the GSL functions. To avoid copying the STL vector into a GSL vector, I create a GSL vector view, and give it to the GSL functions, as in this piece of code:

#include <iostream>
#include <vector>
#include <gsl/gsl_vector.h>
#include <gsl/gsl_statistics.h>
using namespace std;

int main( int argc, char* argv[] )
{
  vector<int> stl_v;
  for( int i=0; i<5; ++i )
    stl_v.push_back( i );

  gsl_vector_int_const_view gsl_v = gsl_vector_int_const_view_array( &stl_v[0], stl_v.size() );

  for( int i=0; i<stl_v.size(); ++i )
    cout << "gsl_v_" << i << "=" << gsl_vector_int_get( &gsl_v.vector, i ) << endl;

  cout << "mean=" << gsl_stats_mean( (double*) gsl_v.vector.data, 1, stl_v.size() ) << endl;
}

Once compiled (gcc -lstdc++ -lgsl -lgslcblas test.cpp), this code outputs this:

gsl_v_0=0
gsl_v_1=1
gsl_v_2=2
gsl_v_3=3
gsl_v_4=4
mean=5.73266e-310

The vector view is properly created but I don't understand why the mean is wrong (it should be equal to 10/5=2). Any idea? Thanks in advance.

tflutre
  • 2,934
  • 4
  • 36
  • 52

6 Answers6

4

The cast to double* is very suspicious.

Any time you are tempted to use a cast, think again. Then look for a way to do it without a cast (maybe by introducing a temporary variable if the conversion is implicit). Then think a third time before you cast.

Since the memory region does not actually contain double values, the code is simply interpreting the bit patterns there as if they represented doubles, with predictably undesired effects. Casting an int* to double* is VERY different from casting each element of the array.

Ben Voigt
  • 260,885
  • 36
  • 380
  • 671
  • Ok, from now on I will be more cautious with cast. Mark B also proposed a solution with a temporary vector, but does this mean that my initial vector has to be copied temporarily? What if this vector is very big? – tflutre Jan 31 '11 at 16:21
3

Use the integer statistics functions:

cout << "mean=" << gsl_stats_int_mean( gsl_v.vector.data, 1, stl_v.size() ) << endl;

Note the gsl_stats_int_mean instead of gsl_stats_mean.

Philipp
  • 43,805
  • 12
  • 78
  • 104
2

Unless you're doing a lot of statistics considerably more complex than the mean, I'd ignore gsl and just use standard algorithms:

double mean = std::accumulate(stl_v.begin(), stl_v.end(), 0.0) / stl_v.size();

When/if using a statistical library is justified, your first choice should probably be to look for something else that's better designed (e.g., Boost Accumulators).

If you decide, for whatever reason, that you really need to use gsl, it looks like you'll have to copy your array of ints to an array of doubles first, then use gsl on the result. This is obvious quite inefficient, especially if you're dealing with a lot of data -- thus the previous advice to use something else instead.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • 1
    I'm writing a simulator, thus I need a RNG as well as functions to compute mean, var, sd, quantiles, and so on. That's why I would like to keep using the GSL, and not use std::accumulate. – tflutre Jan 31 '11 at 16:26
  • @wfoolhill: The standard library already provides PRNG, and Boost accumulators (among many others) can compute all the functions you've mentioned -- much more cleanly than gsl even hopes to. – Jerry Coffin Jan 31 '11 at 16:28
  • I don't know Boost, thus following your advice, I had a quick look at the documentation of Boost.Accumulator. It seems indeed very powerful but I don't think I need all its functionalities. I have all I need with the GSL up to now. – tflutre Feb 01 '11 at 01:07
  • @wfoolhill: Everything you need other than working, of course. And even if you fix that, it's still only "other than working well." – Jerry Coffin Feb 01 '11 at 02:03
1

Although I'm not familiar with GSL, the expression (double*) gsl_v.vector.data looks extremely suspicious. Are you sure it's correct to reinterpret_cast that pointer to get double data?

aschepler
  • 65,919
  • 8
  • 93
  • 144
1

Casting to double* is messing up your data. It is not converting data into double, but just using int binary data as double

Elalfer
  • 5,146
  • 18
  • 24
1

According to http://www.gnu.org/software/gsl/manual/html_node/Mean-and-standard-deviation-and-variance.html the gsl_stats_mean function takes an array of double. You're taking a vector of int and telling it to use the raw bytes as double which isn't going to work right.

You'll need to set up a temporary vector of double to pass in:

// Assumes that there's at least one item in stl_v.
std::vector<double> tempForStats(stl_v.begin(), stl_v.end());
gsl_stats_mean(&tempForStats[0], 1, tempForStats.size());

EDIT: You could also use standard library algorithms to do the int mean yourself:

// Assumes that there's at least one item in stl_v.
double total = std::accumulate(stl_v.begin(), stl_v.end(), 0);
double mean = total / stl_v.size();
Mark B
  • 91,641
  • 10
  • 102
  • 179
  • Thanks, but by using a vector view, I wanted to avoid a temporary vector because, if the initial vector is very big, I will loose time copying it into a temporary one... Or is "tempForStats(stl_v.begin(), stl_v.end());" very efficient? – tflutre Jan 31 '11 at 16:23