CUDA memory debugging

There are two options for debugging memory errors in CUDA programs. They can be found in the CUDA section of the Run window.

See Prepare to debug CUDA GPU code before debugging the memory of a CUDA application.

When Track GPU allocations is enabled, CUDA memory allocations made by the host are tracked. That is, allocations made using functions such as cudaMalloc. You can find out how much memory is allocated, and where it was allocated from using the Current Memory Usage window.

Note

CUDA memory allocations made by the GPU/device such as cuMemAlloc are currently not tracked as well allocations made to Unified Memory with cudaMallocManaged. Furthermore, memory allocations made for CUDA arrays with functions such as cudaMallocArray are not tracked.

Allocations are tracked separately for each GPU and the host. If you enable Track GPU allocations, host-only memory allocations made using functions such as malloc will be tracked as well. You can choose between GPUs using the drop-down list in the top-right corner of the Memory Usage and Memory Statistics windows.

The Detect invalid accesses (memcheck) option switches on the CUDA-MEMCHECK error detection tool. This tool can detect problems such as out-of-bounds and misaligned global memory accesses, and syscall errors, such as calling free () in a kernel on an already free’d pointer.

The other CUDA hardware exceptions (such as a stack overflow) are detected regardless of whether this option is selected or not.

Note

Detect invalid accesses (memcheck) is not supported with CUDA 12.

For further details about CUDA hardware exceptions, see the NVIDIA documentation.

Note

It is not possible to track GPU allocations created by an OpenACC compiler because it does not directly call cudaMalloc.