NVIDIA CUDA

There are a number of issues you should be aware of:

Linaro DDT memory leak reports do not track GPU memory leaks.
Debugging paired CPU/GPU core files is possible but is not yet fully supported.
CUDA metrics in Linaro MAP and Linaro Performance Reports are not available for statically-linked programs.
CUDA metrics in Linaro MAP are measured at the node level, not the card level.
NVIDIA Linux driver 418.43 or later might restrict GPU profiling to users with administrative privileges (CAP_SYS_ADMIN capability set). See the following NVIDIA page for details and instructions for disabling this restriction: NVIDIA Development Tools Solutions - ERR_NVGPUCTRPERM: Permission issue with Performance Counters.
Cray CCE 8.1.2 OpenACC and previous releases fail to generate debug information for local variables in accelerated regions. Install CCE 8.1.3 to address this issue.
When debugging a CUDA application, adding watchpoints on kernel code is not supported.
When debugging a CUDA application, adding watchpoints on host code is only supported in CUDA 11.0 or later.
When debugging a CUDA application, using the Step threads together box and Run to here to step into OpenMP regions is not supported. Use breakpoints to stop at the required line.
When CUDA is set to Detect invalid accesses (memcheck), placing breakpoints in CUDA kernels is only supported in CUDA 11.
Detect invalid accesses (memcheck) is not supported with CUDA 12.
You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1. If you encounter this issue, please contact Forge Support.