Known issues and limitations

This section provides information about known issues and limitations with NVIDIA CUDA debugging.

Environment

From CUDA 13.0 onward NVIDIA GPU Debugging support in Linaro DDT requires cuda-gdb to be available in your environment. This cuda-gdb must be compatible with the NVIDIA GPU Driver installed on the system.

Linaro DDT will fail to start if CUDA is selected in the Run Dialog and cuda-gdb is not detected in the environment. cuda-gdb is available in the standard CUDA Toolkit installation.

In the event that cuda-gdb fails to start, you can still debug your application with the command-line option --no-cuda or deselecting CUDA in the Run Dialog. However, NVIDIA GPU Debugging will not be possible.

Limitations with system cuda-gdb

This section provides information about known issues and limitations with CUDA debugging with the cuda-gdb from the environment.

  • The contents of libstdc++ containers and container iterators are unreadable.

  • Cannot reference anonymous variables using the (anonymous namespace)::myvar syntax in evaluations.

  • Chance of segmentation fault when attaching to a CUDA process.

  • Only the first element of array variables on the device are displayed by default. Workaround by using the View As Vector context menu item.

  • Function evaluation in expressions when using OpenMP is not supported.

  • CUDA Kernel Breakpoint may not be hit for Fortran program if a another breakpoint has been set.

Additionally, Host-side debugging limitations lists the differences that may be expected in host-side debugging when GPU debugging support is enabled.

Debug multiple NVIDIA GPU processes

CUDA allows debugging of multiple CUDA processes on the same node. However, each process will still attempt to reserve all of the available GPUs for debugging.

This works for the case where a single process debugs all GPUs on a node, but not for multiple processes debugging a single GPU.

A temporary workaround when using Open MPI is to export the following environment variable before starting DDT:

FORGE_CUDA_DEVICE_VAR=OMPI_COMM_WORLD_LOCAL_RANK

This will assign a single device (based on local rank) to each process.

In addition:

  • You must select File ‣ Options ‣ Open MPI (Compatibility) (Linaro Forge ‣ Preferences on Mac OS X). (Do not select Open MPI).

  • The device selected for each process will be the only device visible when enumerating GPUs. This causes manual GPU selection code to stop working (due to changing device IDs, and so on).

Thread control

The focus on thread feature is not supported as it can lock up the GPU. This means that it is not possible to control multiple GPUs in the same process individually.

Detect invalid accesses (memcheck)

Detect invalid accesses (memcheck) is not supported with CUDA 12.

Notes

  • NVIDIA CUDA toolkit and driver - Linaro recommends using the most recent version of the toolkit. For more information, see Reference table.

  • X11 cannot be running on any GPU used for debugging. (Any GPU running X11 is excluded from device enumeration.)

  • You must compile with -g -G to enable GPU debugging, otherwise your program will run through the contents of kernels without stopping.

  • It is not possible to spot unsuccessful kernel launches or failures. An error code is provided by getCudaLastError() in the SDK which you can call in your code to detect this.

  • Device memory allocated via cudaMalloc() is not visible outside of the kernel function. Add @global to the type specifier to view the data of device memory pointers while on the host. See View array data for more information.

  • Not all illegal program behavior can be caught in the debugger, for example, divide-by-zero.

  • Breakpoints in divergent code might not behave as expected.

  • Debugging applications with multiple CUDA contexts running on the same GPU is not supported.

  • If CUDA environment variable CUDA_VISIBLE_DEVICES <index> is used to target a particular GPU, make sure no X server is running on any of the GPUs.

    Note

    Any GPU running X will be excluded from enumeration, which can affect the device IDs.

  • If memory debugging and CUDA support are enabled, only thread-safe memory debugging libraries are supported.

  • You may encounter an issue stopping at a GPU breakpoint with the CUDA 12.9 driver on SuSE 15. Contact Forge Support should you encounter this issue.