Known issues and limitations
There are a few known issues for NVIDIA GPU profiling.
NVIDIA Linux driver 418.43 or later might restrict GPU profiling to users with administrative privileges (
CAP_SYS_ADMINcapability set). See the following NVIDIA page for details and instructions for disabling this restriction: NVIDIA Development Tools Solutions - ERR_NVGPUCTRPERM: Permission issue with Performance Counters.GPU memory transfer analysis is only supported with CUDA 12.0 and later.
CUPTI allocates a small amount of host memory each time a kernel is launched. If your program launches many kernels in a tight loop this overhead can skew the memory usage figures.
CUDA kernels generated by CUDA Fortran are not yet supported by Linaro MAP.
The graphs are scaled on the assumption that there is a 1:1 relationship between processes and GPUs, each process having exclusive use of its own CUDA card. The graphs may be of an unexpected height if some processes do not have a GPU, or if multiple processes share the use of a common GPU.
Enabling CUDA kernel analysis mode or CUDA memory transfer analysis mode can have a significant performance impact as described in Performance impact.
GPU profiling is not supported when statically linking the Linaro Forge sampler library.
Stopping GPU profiling mid-process can prevent the GPU Kernels tab displaying, might not report the kernel samples, or the kernel samples may be truncated. This occurs when using
--stop-afteror the Stop and Analyze button. For better results, run the process for a longer time period with longer running kernels.You may experience a hang during profiling when CUDA Kernel Analysis mode is enabled for CUDA Toolkit >= 12.0.1 and < 12.2.2. If you encounter this issue, please contact Forge Support.
CUDA Kernel Analysis is not supported for Blackwell GPUs and beyond for CUDA 12. This is because the PC Sampling Activity API used by this feature has been deprecated by NVIDIA and has dropped support for Blackwell.
CUDA Kernel Analysis is not supported for CUDA 13.