Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

CUDA is Nvidia's parallel computing platform and programming model for GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs. Before posting CUDA questions, please read "How to get Useful Answers to your CUDA Questions on Stack Overflow" below.

CUDA has an online documentation repository, updated with each release, including references for APIs and libraries; user guides for applications; and a detailed CUDA C/C++ Programming Guide.

The CUDA platform enables application development using several languages and associated APIs, including:

There also exist third-party bindings for using CUDA in other languages and programming environments, such as Managed CUDA for .net languages (including C#).

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

How to get Useful Answers to your CUDA questions on

Here are a number of suggestions to users new to CUDA. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

Always check the result codes returned by CUDA API functions to ensure you are getting cudaSuccess. If you are not, and you don't know why, include the information about the error in your question. This includes checking for errors caused by the most recent kernel launch, which may not be available before you've called cudaDeviceSynchronize() or cudaStreamSynchronize(). More on checking for errors in CUDA in this question.
If you are getting unspecified launch failure it is possible that your code is causing a segmentation fault, meaning the code is accessing memory that is not allocated for the code to use. Try to verify that the indexing is correct and check if the CUDA Compute Sanitizer (or cuda-memcheck) is reporting any errors.
The debugger for CUDA, cuda-gdb, is also very useful when you are not really sure what you are doing. You can monitor resources by warp, thread, block, SM and grid level. You can follow your program's execution. If a segmentation fault occurs in your program, cuda-gdb can help you find where the crash occurred and see what the context is.
If you are finding that you are getting syntax errors on CUDA keywords when compiling device code, make sure you are compiling using nvcc (or clang with CUDA support enabled) and that your source file has the expected .cu extension. If you find that CUDA device functions or feature namespaces you expect to work are not found (atomic functions, warp voting functions, half-precision arithmetic, cooperative groups, etc.), ensure that you are explicitly passing compilation arguments which enable architecture settings which support those features.

Books

13112 questions

635

votes

24 answers

How to get the CUDA version?

Is there any quick command or script to check for the version of CUDA installed? I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed version or not.

cuda

asked Mar 15 '12 at 20:30

Hailiang Zhang

13,976
21
62
104

351

votes

22 answers

NVIDIA NVML Driver/library version mismatch

When I run nvidia-smi I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi, getting the following…

cuda driver gpu nvidia

asked Mar 25 '17 at 22:47

etal

8,900
3
11
15

272

votes

4 answers

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError,…

cuda error-checking

asked Dec 26 '12 at 09:35

talonmies

67,081
33
170
244

217

votes

6 answers

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow…

tensorflow cuda version compatibility cudnn

asked May 31 '18 at 10:48

whiletrue

7,352
4
20
38

182

votes

10 answers

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container. The container will execute arbitrary code so i don't want to use the privileged mode. Any tips? From previous research i understood that run -v and/or LXC cgroup was the way to…

cuda docker

asked Aug 07 '14 at 14:41

Regan

7,023
4
19
23

180

votes

10 answers

How to verify CuDNN installation?

I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to verify CuDNN is installed. Help will be much…

cuda computer-vision caffe conv-neural-network cudnn

asked Jul 09 '15 at 18:58

alfredox

2,952
6
17
29

164

votes

2 answers

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)

How are threads organized to be executed by a GPU?

cuda nvidia

asked Mar 06 '10 at 11:08

cibercitizen1

19,104
14
65
90

163

votes

16 answers

A top-like utility for monitoring CUDA activity on a GPU

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?

cuda process-monitoring resource-monitor

asked Nov 22 '11 at 08:19

natorro

1,963
3
16
15

155

votes

2 answers

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of…

cuda gpgpu nvidia warp-scheduler

asked May 05 '12 at 09:58

Daedalus

1,551
3
10
3

153

votes

4 answers

Different CUDA versions shown by nvcc and NVIDIA-smi

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run : $ which nvcc …

cuda

asked Nov 22 '18 at 00:44

yuqli

2,705
6
20
37

153

votes

5 answers

Using Java with Nvidia GPUs (CUDA)

I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data. We ordered some CUDA GPUs to try it with and since Java is not supported by CUDA, I'm…

java cuda gpu multi-gpu

asked Apr 04 '14 at 15:27

Hans

1,736
3
12
18

120

votes

3 answers

How do I choose grid and block dimensions for CUDA kernels?

This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here. Following this link, the answer from talonmies contains a code snippet (see below). I don't understand the…

performance optimization cuda gpu nvidia

asked Apr 03 '12 at 01:14

user1292251

1,445
3
15
15

120

votes

9 answers

Difference between global and device functions

Can anyone describe the differences between __global__ and __device__ ? When should I use __device__, and when to use __global__?.

cuda

asked Sep 11 '12 at 16:15

Mehdi Saman Booy

2,246
5
23
30

115

votes

7 answers

GPU Emulator for CUDA programming without the hardware

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my…

cuda gpu emulation cpu

asked Jun 21 '10 at 18:28

Narcolapser

4,937
14
42
52

114

votes

21 answers

CUDA incompatible with my gcc version

I have troubles compiling some of the examples shipped with CUDA SDK. I have installed the developers driver (version 270.41.19) and the CUDA toolkit, then finally the SDK (both the 4.0.17 version). Initially it didn't compile at all giving: error…

gcc cuda debian

asked Jul 08 '11 at 09:25

fbielejec

2,972
4
21
33

2 3

…

99 100 Next