3

We've successfully used the following post to help create structs that contain basic types like int *. Textures provide a nice performance boost for read-only arrays. We use many of them, which makes the argument lists for the kernels and the kernel sub-functions long and complicated. We'd like to embed the Textures in structures to reduce the argument length and complexity.

Copying a struct containing pointers to CUDA device

Here's a snippet representing the code methodology we use. It compiles, but crashes at run-time.

// Initialize texture description
memset(&textureDescription, 0, sizeof(textureDescription));
textureDescription.readMode = cudaReadModeElementType;

// Create Texture from variable
cudaTextureObject_t texture = 0;
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);

// These declarations are in the .h file
typedef struct SampleStructure {
   cudaTextureObject_t texture;
} SampleStructure;
SampleStructure *structureHost;
SampleStructure *structureDevice;

// Create host and device structures
structureHost = (SampleStructure *)malloc(sizeof(SampleStructure));
cudaMalloc(&structureDevice, sizeof(SampleStructure));

// Assign the texture object to the host structure
structureHost->texture = texture;

// Copy the host structure to Global Memory
cudaMemcpy(structureDevice, structureHost, sizeof(SampleStructure), cudaMemcpyHostToDevice));

// Pass Texture and Texture-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureDevice);

...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure *structureDevice) {
    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
    value = tex1Dfetch<int>(structureDevice->texture, index); // Crashes at runtime
}

When using the "texture" variable in the kernel code (or sub-function), it runs correctly. When using "structureDevice->texture" instead, it crashes at run-time.

Can someone show a simple code showing how to successfully embed a texture object in a struct that's passed to a kernel and runs without crashing? Or can someone point out where the mistake might be in the code that we've presented?

Community
  • 1
  • 1
roger1994
  • 129
  • 8
  • Why don't you just pass the structure by value, rather than by reference? – talonmies Nov 29 '15 at 20:17
  • @talonmies Passing by value causes a runtime error in the kernel just by making the kernel call. I modified the argument list to accept the structure by value, but passing a structure that way causes a runtime failure. kernel<<1,1>>(texture, *structureDevice); – roger1994 Nov 30 '15 at 05:27
  • 2
    That isn't passing the structure by value. That is dereferencing a device pointer on the host, which is obviously illegal. Just make a structure in host memory and pass it by value. Supported architectures can use argument lists up to 4kb in size, so there is no practical size limitation in using pass by value – talonmies Nov 30 '15 at 05:36
  • @talonmies Passing by value worked! Thank you! – roger1994 Nov 30 '15 at 18:44
  • Feel free to go ahead and add what you did as an answer. Later you will be able to accept the answer, which gets the question off the unanswered list and makes it more visible in search (I'll upvote it too..) – talonmies Nov 30 '15 at 18:49

1 Answers1

4

Passing the structure by value got a working solution. Here is the code equivalent that gets it to work. Thanks to @talonmies for the suggestion.

While a structure can simplify the argument list, it can slow down the execution because the system has to make a 2 calls to Global Memory instead of 1: 1 call to get the structure and 1 call to get the texture. To improve the performance, the structure can be copied to shared memory. Using the structure in shared memory improves performance.

// Create the Texture Object
cudaResourceDesc resource;
memset(&resource, 0, sizeof(resource));
resource.resType = cudaResourceTypeLinear;
resource.res.linear.devPtr = intArray;
resource.res.linear.desc.f = cudaChannelFormatKindSigned;
resource.res.linear.desc.x = 32; // bits per channel
resource.res.linear.sizeInBytes = count*sizeof(int);
cudaCreateTextureObject(&texture, resource, &textureDescription, NULL);

// These structure declarations are in the .h file
typedef struct SampleStructure {
   cudaTextureObject_t texture;
} SampleStructure;
SampleStructure structureHost;

// Assign the texture object to the host structure
structureHost.texture = texture;

// Pass Texture and Texture-object-embedded-in-structure to kernel
kenerl<<<1,1>>>(texture, structureHost);

...
__global__ void
kernel(cudaTextureObject_t texture, SampleStructure structureDevice) {
    __shared__ SampleStructure structureSharedMemory;

    // Copy the structure to shared memory for faster access
    if (threadIdx.x == 0)
       structureSharedMemory = structureDevice;
    __threadfence_block();

    value = tex1Dfetch<int>(texture, index); // Runs successfully at runtime
    value = tex1Dfetch<int>(structureSharedMemory.texture, index); // Runs successfully at runtime
}
roger1994
  • 129
  • 8