NVIDIA CUDA

There are a number of issues you should be aware of:

  • Linaro DDT memory leak reports do not track GPU memory leaks.

  • Debugging paired CPU/GPU core files is possible but is not yet fully supported.

  • CUDA metrics in Linaro MAP and Linaro Performance Reports are not available for statically-linked programs.

  • CUDA metrics in Linaro MAP are measured at the node level, not the card level.

  • NVIDIA Linux driver 418.43 or later might restrict GPU profiling to users with administrative privileges (CAP_SYS_ADMIN capability set). See the following NVIDIA page for details and instructions for disabling this restriction: NVIDIA Development Tools Solutions - ERR_NVGPUCTRPERM: Permission issue with Performance Counters.

  • Cray CCE 8.1.2 OpenACC and previous releases fail to generate debug information for local variables in accelerated regions. Install CCE 8.1.3 to address this issue.

  • When debugging a CUDA application, adding watchpoints on kernel code is not supported.

  • When debugging a CUDA application, adding watchpoints on host code is only supported in CUDA 11.0 or later.

  • When debugging a CUDA application, using the Step threads together box and Run to here to step into OpenMP regions is not supported. Use breakpoints to stop at the required line.

  • When CUDA is set to Detect invalid accesses (memcheck), placing breakpoints in CUDA kernels is only supported in CUDA 11.

  • Detect invalid accesses (memcheck) is not supported with CUDA 12.

  • You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1. If you encounter this issue, please contact Forge Support.