12

I am using the HDF5 C++ API to write 2D array dataset files. The HDF Group has an example to create a HDF5 file from a statically defined array size, which I've modified to suite my needs below. However, I require a dynamic array, where both NX and NY are determined at runtime. I've found another solution to create 2D arrays using the "new" keyword to help create a dynamic array. Here is what I have:

#include "StdAfx.h"
#include "H5Cpp.h"
using namespace H5;

const H5std_string FILE_NAME("C:\\SDS.h5");
const H5std_string DATASET_NAME("FloatArray");
const int NX = 5; // dataset dimensions
const int NY = 6;

int main (void)
{
    // Create a 2D array using "new" method
    double **data = new double*[NX];
    for (int j = 0; j < NX; j++)         // 0 1 2 3 4 5
    {                                    // 1 2 3 4 5 6
        data[j] = new double[NY];        // 2 3 4 5 6 7
        for (int i = 0; i < NY; i++)     // 3 4 5 6 7 8
            data[j][i] = (float)(i + j); // 4 5 6 7 8 9
    }

    // Create HDF5 file and dataset
    H5File file(FILE_NAME, H5F_ACC_TRUNC);
    hsize_t dimsf[2] = {NX, NY};
    DataSpace dataspace(2, dimsf);
    DataSet dataset = file.createDataSet(DATASET_NAME, PredType::NATIVE_DOUBLE,
                                            dataspace);
    // Attempt to write data to HDF5 file
    dataset.write(data, PredType::NATIVE_DOUBLE);

    // Clean up
    for(int j = 0; j < NX; j++)
        delete [] data[j];
    delete [] data;
    return 0;
}

The resulting file, however, is not as expected (output from hdf5dump):

HDF5 "SDS.h5" {
GROUP "/" {
   DATASET "FloatArray" {
      DATATYPE  H5T_IEEE_F64LE
      DATASPACE  SIMPLE { ( 5, 6 ) / ( 5, 6 ) }
      DATA {
      (0,0): 4.76465e-307, 4.76541e-307, -7.84591e+298, -2.53017e-098, 0,
      (0,5): 3.8981e-308,
      (1,0): 4.76454e-307, 0, 2.122e-314, -7.84591e+298, 0, 1,
      (2,0): 2, 3, 4, 5, -2.53017e-098, -2.65698e+303,
      (3,0): 0, 3.89814e-308, 4.76492e-307, 0, 2.122e-314, -7.84591e+298,
      (4,0): 1, 2, 3, 4, 5, 6
      }
   }
}
}

The problem stems back to how the 2D array was created (since this example works fine with a static array method). As I understand from this email thread:

The HDF5 library expects to a contiguous array of elements, not pointers to elements in lower dimensions

As I am rather new to C++/HDF5, I'm not sure how to create a dynamically sized array at runtime that is a contiguous array of elements. I do not want to do the more complicated "hyperslab" method described in the email thread, as this looks overly complicated. Any help is appreciated.

Community
  • 1
  • 1
Mike T
  • 34,456
  • 15
  • 128
  • 169

4 Answers4

10

Well, I don't know anything about HDF5, but dynamic 2D arrays in C++ with a contiguous buffer can be simulated by using a 1D array of size NX * NY. For example:

Allocation:

double *data = new double[NX*NY];

Element access:

 data[j*NY + i]

(instead of data[j][i])

Doc Brown
  • 18,656
  • 6
  • 48
  • 86
  • 1
    This solution is simple to implement and scales really well (I'm using array sizes with 21.9M elements). The result verifies perfectly in the HDF5 file output. – Mike T Sep 14 '11 at 19:27
  • Works like a charm. Thank you. – CuriousCase Jun 20 '16 at 15:43
  • I know that this is quite old, but, I think there is an error in the last sentence. It should be: (instead of `data[i][j]` ) – pablo_worker Nov 09 '16 at 13:47
  • @pablo_worker: the last sentence is correct if j is in the range 0..(NX-1) and i in the range 0..(NY-1), like in the OP's question. – Doc Brown Nov 09 '16 at 14:05
  • @DocBrown You are right. I always use `i` for rows and `j` for columns, and I didn't notice that OP was using different notation. Sorry. – pablo_worker Nov 10 '16 at 07:37
7

Here is how to write N dimension arrays in HDF5 format

It is much better to use the boost multi_array class. This is the equivalent of using std::vector rather than raw arrays: It does all the memory management for you and you can access elements as efficiently as raw arrays using familiar subscripting (e.g. data[12][13] = 46)

Here is a short example:

#include <algorithm>
#include <boost/multi_array.hpp>
using boost::multi_array;
using boost::extents;

// dataset dimensions set at run time
int NX = 5,  NY = 6,  NZ = 7;


// allocate array using the "extents" helper. 
// This makes it easier to see how big the array is
multi_array<double, 3>  float_data(extents[NX][NY][NZ]);

// use resize to change size when necessary
// float_data.resize(extents[NX + 5][NY + 4][NZ + 3]);


// This is how you would fill the entire array with a value (e.g. 3.0)
std::fill_n(float_data.data(), float_data.num_elements(), 3.0)

// initialise the array to some variables
for (int ii = 0; ii != NX; ii++)
    for (int jj = 0; jj != NY; jj++)
        for (int kk = 0; kk != NZ; kk++)
            float_data[ii][jj][kk]  = ii + jj + kk

// write to HDF5 format
H5::H5File file("SDS.h5", H5F_ACC_TRUNC);
write_hdf5(file, "doubleArray", float_data );

The last line calls a function which can write multi_arrays of any dimension and any standard number type (ints, chars, floats etc).

Here is code for write_hdf5().

First, we must map c++ types to HDF5 types (from the H5 c++ api):

#include <cstdint>

//!_______________________________________________________________________________________
//!     
//!     map types to HDF5 types
//!         
//!     
//!     \author lg (04 March 2013)
//!_______________________________________________________________________________________ 

template<typename T> struct get_hdf5_data_type
{   static H5::PredType type()  
    {   
        //static_assert(false, "Unknown HDF5 data type"); 
        return H5::PredType::NATIVE_DOUBLE; 
    }
};
template<> struct get_hdf5_data_type<char>                  {   H5::IntType type    {   H5::PredType::NATIVE_CHAR       };  };
//template<> struct get_hdf5_data_type<unsigned char>       {   H5::IntType type    {   H5::PredType::NATIVE_UCHAR      };  };
//template<> struct get_hdf5_data_type<short>               {   H5::IntType type    {   H5::PredType::NATIVE_SHORT      };  };
//template<> struct get_hdf5_data_type<unsigned short>      {   H5::IntType type    {   H5::PredType::NATIVE_USHORT     };  };
//template<> struct get_hdf5_data_type<int>                 {   H5::IntType type    {   H5::PredType::NATIVE_INT        };  };
//template<> struct get_hdf5_data_type<unsigned int>        {   H5::IntType type    {   H5::PredType::NATIVE_UINT       };  };
//template<> struct get_hdf5_data_type<long>                {   H5::IntType type    {   H5::PredType::NATIVE_LONG       };  };
//template<> struct get_hdf5_data_type<unsigned long>       {   H5::IntType type    {   H5::PredType::NATIVE_ULONG      };  };
template<> struct get_hdf5_data_type<long long>             {   H5::IntType type    {   H5::PredType::NATIVE_LLONG      };  };
template<> struct get_hdf5_data_type<unsigned long long>    {   H5::IntType type    {   H5::PredType::NATIVE_ULLONG     };  };
template<> struct get_hdf5_data_type<int8_t>                {   H5::IntType type    {   H5::PredType::NATIVE_INT8       };  };
template<> struct get_hdf5_data_type<uint8_t>               {   H5::IntType type    {   H5::PredType::NATIVE_UINT8      };  };
template<> struct get_hdf5_data_type<int16_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT16      };  };
template<> struct get_hdf5_data_type<uint16_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT16     };  };
template<> struct get_hdf5_data_type<int32_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT32      };  };
template<> struct get_hdf5_data_type<uint32_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT32     };  };
template<> struct get_hdf5_data_type<int64_t>               {   H5::IntType type    {   H5::PredType::NATIVE_INT64      };  };
template<> struct get_hdf5_data_type<uint64_t>              {   H5::IntType type    {   H5::PredType::NATIVE_UINT64     };  };
template<> struct get_hdf5_data_type<float>                 {   H5::FloatType type  {   H5::PredType::NATIVE_FLOAT      };  };
template<> struct get_hdf5_data_type<double>                {   H5::FloatType type  {   H5::PredType::NATIVE_DOUBLE     };  };
template<> struct get_hdf5_data_type<long double>           {   H5::FloatType type  {   H5::PredType::NATIVE_LDOUBLE    };  };

Then we can use a bit of template forwarding magic to make a function of the right type to output our data. Since this is template code, it needs to live in a header file if you are going to output HDF5 arrays from multiple source files in your programme:

//!_______________________________________________________________________________________
//!     
//!     write_hdf5 multi_array
//!         
//!     \author leo Goodstadt (04 March 2013)
//!     
//!_______________________________________________________________________________________
template<typename T, std::size_t DIMENSIONS, typename hdf5_data_type>
void do_write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data, hdf5_data_type& datatype)
{
    // Little endian for x86
    //FloatType datatype(get_hdf5_data_type<T>::type());
    datatype.setOrder(H5T_ORDER_LE);

    vector<hsize_t> dimensions(data.shape(), data.shape() + DIMENSIONS);
    H5::DataSpace dataspace(DIMENSIONS, dimensions.data());

    H5::DataSet dataset = file.createDataSet(data_set_name, datatype, dataspace);

    dataset.write(data.data(), datatype);
}

template<typename T, std::size_t DIMENSIONS>
void write_hdf5(H5::H5File file, const std::string& data_set_name, const boost::multi_array<T, DIMENSIONS>& data )
{

    get_hdf5_data_type<T> hdf_data_type;
    do_write_hdf5(file, data_set_name, data, hdf_data_type.type);
}
Leo Goodstadt
  • 2,001
  • 1
  • 19
  • 22
2

In scientific programming it's common to represent multidimensional arrays as a big 1D array and then calculating the corresponding offset from the multidimensional indices, e.g. as seen in the answer by Doc Brown.

Alternatively, you can overload the subscript operator (operator[]()) in order to provide an interface that allows the use of multi-dimensional indices backed by the 1D array. Or better yet, use a library which does this, such as Boost multi_array. Or in case your 2D arrays are matrices, you can use a nice C++ linear algebra library such as Eigen.

janneb
  • 32,371
  • 2
  • 74
  • 90
0

I've been struggling with a similar question for some time too. For some reasons I need to process data stream in C++, but eventually I would like to analyze the resulting HDF in python, using benefits of numpy and matplotlib. The solution is simpler than expected. First I declare the dataspace of whatever shape I really need.

hsize_t dims[2] = {rows, cols};         
dataspace = new DataSpace(2, dims);
dataset = new DataSet(group->createDataSet("data", PredType::STD_U16LE, *dataspace));

Next I use 1D dynamic array and fill it in remembering that element [i][j] is at position [i * cols + j]

unsigned short* hits = new unsigned short[cols * rows]; (...) hits[i * cols + j] = foo; (...) Now the fun part. Since DataSet.write takes void* it does not care about what you pass. It just takes contiguous array of elements, and the shape is interpreted by the DataSpace definition. Since our dynamic array is contiguous, of correct overall size and elements ordering, you may just simply write it.

dataset->write(hits, PredType::STD_U16LE);

The resulting array is correctly interpreted as 2D if you read your HDF5 file later on.

mak
  • 151
  • 1
  • 3