Enabling the CUPTI sampling will impact the target program in the following ways:
Taken together the above may have a significant impact on the target
program, potentially resulting in orders of magnitude slowdown. To
combat this profile and analyze CUDA code kernels (with
–cuda-kernel-analysis
) and non-CUDA code (no
–cuda-kernel-analysis
) in separate profiling sessions.
The NVIDIA GPU metrics will be adversely affected by this overhead,
particularly the GPU utilization
metric. See Accelerator.
Enabling the CUDA memory transfer analysis feature will impact the target program in the following ways:
This overhead will primarily impact the host (CPU). GPU kernel performance should be unaffected unless the host overhead delays one or more memory transfers that a GPU kernel needs in order to progress.
When profiling CUDA code it may be useful to only profile a short subsection of the program so time is not wasted waiting for CUDA kernels you do not intend to examine. See Profiling only part of a program in Profile a program for instructions.