Known issues for NVIDIA GPU profiling

There are a few known issues for NVIDIA GPU profiling.

  • GPU profiling is only supported with CUDA 8.0 and later. For information about currently supported software versions, see Reference table.

  • GPU memory transfer analysis is only supported with CUDA 11.0 and later.

  • CUPTI allocates a small amount of host memory each time a kernel is launched. If your program launches many kernels in a tight loop this overhead can skew the memory usage figures.

  • CUDA kernels generated by CUDA Fortran are not yet supported by Linaro MAP.

  • The graphs are scaled on the assumption that there is a 1:1 relationship between processes and GPUs, each process having exclusive use of its own CUDA card. The graphs may be of an unexpected height if some processes do not have a GPU, or if multiple processes share the use of a common GPU.

  • Enabling CUDA kernel analysis mode or CUDA memory transfer analysis mode can have a significant performance impact as described in Performance impact.

  • GPU profiling is not supported when statically linking the Linaro Forge sampler library.

  • Stopping GPU profiling mid-process can prevent the GPU Kernels tab displaying, and might not report the kernel samples. This occurs when using --stop-after or the Stop and Analyze button. For better results, run the process for a longer time period with longer running kernels. When kernel samples are reported, they can be truncated.

  • You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1 and < 12.2.2. If you encounter this issue, please contact Forge Support.