7

Programming with CUDA I am facing a problem trying to copy some data from host to gpu.

I have 3 nested struct like these:

typedef struct {
    char data[128];
    short length;
} Cell;

typedef struct {
    Cell* elements;
    int height;
    int width;
} Matrix;

typedef struct {
    Matrix* tables;
    int count;
} Container;

So Container "includes" some Matrix elements, which in turn includes some Cell elements.

Let's suppose I dynamically allocate the host memory in this way:

Container c;
c.tables = malloc(20 * sizeof(Matrix));

for(int i = 0;i<20;i++){
    Matrix m;
    m.elements = malloc(100 * sizeof(Cell));
    c.tables[i] = m;
}

That is, a Container of 20 Matrix of 100 Cells each.

  • How could I now copy this data to the device memory using cudaMemCpy()?
  • Is there any good way to perform a deep copy of "struct of struct" from host to device?

Thanks for your time.

Andrea

Andrea
  • 3,417
  • 4
  • 22
  • 35

1 Answers1

3

The short answer is "just don't". There are four reasons why I say that:

  1. There is no deep copy functionality in the API
  2. The resulting code you will have to writeto set up and copy the structure you have described to the GPU will be ridiculously complex (about 4000 API calls at a minimum, and probably an intermediate kernel for your 20 Matrix of 100 Cells example)
  3. The GPU code using three levels of pointer indirection will have massively increased memory access latency and will break what little cache coherency is available on the GPU
  4. If you want to copy the data back to the host afterwards, you have the same problem in reverse

Consider using linear memory and indexing instead. It is portable between host and GPU, and the allocation and copy overhead is about 1% of the pointer based alternative.

If you really want to do this, leave a comment and I will try and dig up some old code examples which show what a complete folly nested pointers are on the GPU.

talonmies
  • 67,081
  • 33
  • 170
  • 244
  • I read a lot about using linear memory and flattening arrays. Actually I manage the field `elements` in the `Matrix` struct as linear memory, while the high level representation would be 2D. I would try to linearize/flatten the 3 structure as well, but how could I do this? Wouldn't it be too difficult to manage all the indexes? Anyway, thanks for your help, do not bother of looking for some old code! – Andrea Jul 03 '11 at 19:06
  • @talonmies I am interested in seeing some old code examples you had about nested pointers – A_Man Oct 01 '20 at 14:32