Known issues for NVIDIA GPU profiling
There are a few known issues for NVIDIA GPU profiling.
- GPU profiling is only supported with CUDA 8.0 and later. For information about currently supported software versions, see Reference table. 
- GPU memory transfer analysis is only supported with CUDA 11.0 and later. 
- CUPTI allocates a small amount of host memory each time a kernel is launched. If your program launches many kernels in a tight loop this overhead can skew the memory usage figures. 
- CUDA kernels generated by CUDA Fortran are not yet supported by Linaro MAP. 
- The graphs are scaled on the assumption that there is a 1:1 relationship between processes and GPUs, each process having exclusive use of its own CUDA card. The graphs may be of an unexpected height if some processes do not have a GPU, or if multiple processes share the use of a common GPU. 
- Enabling CUDA kernel analysis mode or CUDA memory transfer analysis mode can have a significant performance impact as described in Performance impact. 
- GPU profiling is not supported when statically linking the Linaro Forge sampler library. 
- Stopping GPU profiling mid-process can prevent the GPU Kernels tab displaying, and might not report the kernel samples. This occurs when using - --stop-afteror the Stop and Analyze button. For better results, run the process for a longer time period with longer running kernels. When kernel samples are reported, they can be truncated.
- You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1 and < 12.2.2. If you encounter this issue, please contact Forge Support.