Questions tagged [cuda]

CUDA (Compute Unified Device Architecture) is a parallel computing platform and programming model for NVIDIA GPUs (Graphics Processing Units). CUDA provides an interface to NVIDIA GPUs through a variety of programming languages, libraries, and APIs.

CUDA is Nvidia's parallel computing platform and programming model for GPUs (Graphics Processing Units). CUDA provides an interface to Nvidia GPUs through a variety of programming languages, libraries, and APIs. Before posting CUDA questions, please read "How to get Useful Answers to your CUDA Questions on Stack Overflow" below.

CUDA has an online documentation repository, updated with each release, including references for APIs and libraries; user guides for applications; and a detailed CUDA C/C++ Programming Guide.

The CUDA platform enables application development using several languages and associated APIs, including:

There also exist third-party bindings for using CUDA in other languages and programming environments, such as Managed CUDA for .net languages (including C#).

You should ask questions about CUDA here on Stack Overflow, but if you have bugs to report you should discuss them on the CUDA forums or report them via the registered developer portal. You may want to cross-link to any discussion here on SO.

How to get Useful Answers to your CUDA questions on

Here are a number of suggestions to users new to CUDA. Follow these suggestions before asking your question and you are much more likely to get a satisfactory answer!

  • Always check the result codes returned by CUDA API functions to ensure you are getting cudaSuccess. If you are not, and you don't know why, include the information about the error in your question. This includes checking for errors caused by the most recent kernel launch, which may not be available before you've called cudaDeviceSynchronize() or cudaStreamSynchronize(). More on checking for errors in CUDA in this question.
  • If you are getting unspecified launch failure it is possible that your code is causing a segmentation fault, meaning the code is accessing memory that is not allocated for the code to use. Try to verify that the indexing is correct and check if the CUDA Compute Sanitizer (or cuda-memcheck) is reporting any errors.
  • The debugger for CUDA, , is also very useful when you are not really sure what you are doing. You can monitor resources by warp, thread, block, SM and grid level. You can follow your program's execution. If a segmentation fault occurs in your program, can help you find where the crash occurred and see what the context is.
  • If you are finding that you are getting syntax errors on CUDA keywords when compiling device code, make sure you are compiling using nvcc (or clang with CUDA support enabled) and that your source file has the expected .cu extension. If you find that CUDA device functions or feature namespaces you expect to work are not found (atomic functions, warp voting functions, half-precision arithmetic, cooperative groups, etc.), ensure that you are explicitly passing compilation arguments which enable architecture settings which support those features.

Books

13112 questions
635
votes
24 answers

How to get the CUDA version?

Is there any quick command or script to check for the version of CUDA installed? I found the manual of 4.0 under the installation directory but I'm not sure whether it is of the actual installed version or not.
Hailiang Zhang
  • 13,976
  • 21
  • 62
  • 104
351
votes
22 answers

NVIDIA NVML Driver/library version mismatch

When I run nvidia-smi I get the following message: Failed to initialize NVML: Driver/library version mismatch An hour ago I received the same message and uninstalled my cuda library and I was able to run nvidia-smi, getting the following…
etal
  • 8,900
  • 3
  • 11
  • 15
272
votes
4 answers

What is the canonical way to check for errors using the CUDA runtime API?

Looking through the answers and comments on CUDA questions, and in the CUDA tag wiki, I see it is often suggested that the return status of every API call should checked for errors. The API documentation contains functions like cudaGetLastError,…
talonmies
  • 67,081
  • 33
  • 170
  • 244
217
votes
6 answers

Which TensorFlow and CUDA version combinations are compatible?

I have noticed that some newer TensorFlow versions are incompatible with older CUDA and cuDNN versions. Does an overview of the compatible versions or even a list of officially tested combinations exist? I can't find it in the TensorFlow…
whiletrue
  • 7,352
  • 4
  • 20
  • 38
182
votes
10 answers

Using GPU from a docker container?

I'm searching for a way to use the GPU from inside a docker container. The container will execute arbitrary code so i don't want to use the privileged mode. Any tips? From previous research i understood that run -v and/or LXC cgroup was the way to…
Regan
  • 7,023
  • 4
  • 19
  • 23
180
votes
10 answers

How to verify CuDNN installation?

I have searched many places but ALL I get is HOW to install it, not how to verify that it is installed. I can verify my NVIDIA driver is installed, and that CUDA is installed, but I don't know how to verify CuDNN is installed. Help will be much…
alfredox
  • 2,952
  • 6
  • 17
  • 29
164
votes
2 answers

Understanding CUDA grid dimensions, block dimensions and threads organization (simple explanation)

How are threads organized to be executed by a GPU?
cibercitizen1
  • 19,104
  • 14
  • 65
  • 90
163
votes
16 answers

A top-like utility for monitoring CUDA activity on a GPU

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
natorro
  • 1,963
  • 3
  • 16
  • 15
155
votes
2 answers

How do CUDA blocks/warps/threads map onto CUDA cores?

I have been using CUDA for a few weeks, but I have some doubts about the allocation of blocks/warps/thread. I am studying the architecture from a didactic point of view (university project), so reaching peak performance is not my concern. First of…
Daedalus
  • 1,551
  • 3
  • 10
  • 3
153
votes
4 answers

Different CUDA versions shown by nvcc and NVIDIA-smi

I am very confused by the different CUDA versions shown by running which nvcc and nvidia-smi. I have both cuda9.2 and cuda10 installed on my ubuntu 16.04. Now I set the PATH to point to cuda9.2. So when I run : $ which nvcc …
yuqli
  • 2,705
  • 6
  • 20
  • 37
153
votes
5 answers

Using Java with Nvidia GPUs (CUDA)

I'm working on a business project that is done in Java, and it needs huge computation power to compute business markets. Simple math, but with huge amount of data. We ordered some CUDA GPUs to try it with and since Java is not supported by CUDA, I'm…
Hans
  • 1,736
  • 3
  • 12
  • 18
120
votes
3 answers

How do I choose grid and block dimensions for CUDA kernels?

This is a question about how to determine the CUDA grid, block and thread sizes. This is an additional question to the one posted here. Following this link, the answer from talonmies contains a code snippet (see below). I don't understand the…
user1292251
  • 1,445
  • 3
  • 15
  • 15
120
votes
9 answers

Difference between global and device functions

Can anyone describe the differences between __global__ and __device__ ? When should I use __device__, and when to use __global__?.
Mehdi Saman Booy
  • 2,246
  • 5
  • 23
  • 30
115
votes
7 answers

GPU Emulator for CUDA programming without the hardware

Question: Is there an emulator for a Geforce card that would allow me to program and test CUDA without having the actual hardware? Info: I'm looking to speed up a few simulations of mine in CUDA, but my problem is that I'm not always around my…
Narcolapser
  • 4,937
  • 14
  • 42
  • 52
114
votes
21 answers

CUDA incompatible with my gcc version

I have troubles compiling some of the examples shipped with CUDA SDK. I have installed the developers driver (version 270.41.19) and the CUDA toolkit, then finally the SDK (both the 4.0.17 version). Initially it didn't compile at all giving: error…
fbielejec
  • 2,972
  • 4
  • 21
  • 33
1
2 3
99 100