Customize NVIDIA GPU profiling behavior

The interval at which CUPTI samples GPU warps can be modified by the environment variable FORGE_SAMPLER_GPU_INTERVAL. Accepted values are max, high, mid, low, and min, with the default value being high. These correspond to the values in the enum CUpti_ActivityPCSamplingPeriod in the CUPTI API documentation.

Using CUDA 11.0+ on GPUs with compute capability 7.0+, the interval at which CUPTI samples GPU warps can also be modified by providing an integer value 5 ≤ x ≤ 31 to the environment variable FORGE_SAMPLER_GPU_INTERVAL. This sets the interval in cycles to exactly 2x.

Reducing the sampling interval means warp samples are taken more frequently. While this may be needed for very short-lived kernels, setting the interval too low can result in a very large number of warp samples being taken which then require significant post-processing time when the kernel completes. Overheads of twice as long as the kernel’s normal runtime have been observed. We recommend that the CUPTI sampling interval is not reduced.