Known issues for NVIDIA GPU profiling

There are a few known issues for NVIDIA GPU profiling.

GPU profiling is only supported with CUDA 8.0 and later. For information about currently supported software versions, see Reference table.
GPU memory transfer analysis is only supported with CUDA 11.0 and later.
When you prepare your program for profiling, the version of the CUDA toolkit needs to match the version of the CUDA driver. Mixing versions of a CUDA program and CUDA driver is not supported. GPU profiling is not supported if the CUDA toolkit and driver versions do not match.

Note

For information about currently supported software versions, see Reference table.
CUPTI allocates a small amount of host memory each time a kernel is launched. If your program launches many kernels in a tight loop this overhead can skew the memory usage figures.
CUDA kernels generated by CUDA Fortran are not yet supported by Linaro MAP.
The graphs are scaled on the assumption that there is a 1:1 relationship between processes and GPUs, each process having exclusive use of its own CUDA card. The graphs may be of an unexpected height if some processes do not have a GPU, or if multiple processes share the use of a common GPU.
Enabling CUDA kernel analysis mode or CUDA memory transfer analysis mode can have a significant performance impact as described in Performance impact.
GPU profiling is not supported when statically linking the Linaro Forge sampler library.
Stopping GPU profiling mid-process can prevent the GPU Kernels tab displaying, and might not report the kernel samples. This occurs when using --stop-after or the Stop and Analyze button. For better results, run the process for a longer time period with longer running kernels. When kernel samples are reported, they can be truncated.
You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1 and < 12.2.2. If you encounter this issue, please contact Forge Support.