Questions tagged [ldg]
2 questions
4
votes
1 answer
Cuda L2 transfer overhead
I have a kernel to test rendering points with atomicMin. The test setup has a tons of points in an idea case memory layout. Two buffers, one uint32 for clusters of 256x uint32.
namespace Point
{
struct PackedBitfield
{
glm::uint32_t x : 6;
…
![](../../users/profiles/440197.webp)
FHoenig
- 229
- 1
- 10
1
vote
2 answers
What are (empirically) sufficient conditions for NVCC to use ldg instead of normal loads?
Section G.4.2 of the CUDA C Programming Guide says:
Data that is read-only for the entire lifetime of the kernel can also be cached in the read-only data cache described in the previous section by reading it using the __ldg() function ... When the…
![](../../users/profiles/1593077.webp)
einpoklum
- 86,754
- 39
- 223
- 453