CUDA kernel analysis mode is an advanced feature that provides insight
into the activity within CUDA kernels. This mode can be enabled from the
Run dialog or from the command line with --cuda-kernel-analysis
.
When enabled, the GPU Kernels tab is enhanced to show a line-level
breakdown of warp stalls. The possible categories of warp stall reasons
are as listed in the enum CUpti_ActivityPCSamplingStallReason
in the
CUPTI API documentation (http://docs.nvidia.com/cuda/cupti/group__CUPTI__ACTIVITY__API.html):
Selected No stall, instruction is selected for issue.
Instruction fetch Warp is blocked because next instruction is not yet available, because of an instruction cache miss, or because of branching effects.
Execution dependency Instruction is waiting on an arithmetic dependency.
Memory dependency Warp is blocked because it is waiting for a memory access to complete.
Texture sub-system Texture sub-system is fully utilized or has too many outstanding requests.
Thread or memory barrier
Warp is blocked as it is waiting at __syncthreads
or at a memory barrier.
__constant__ memory
Warp is blocked waiting for __constant__ memory
and immediate memory access to complete.
Pipe busy Compute operation cannot be performed due to required resource not being available.
Memory throttle Warp is blocked because there are too many pending memory operations.
Not selected Warp was ready to issue, but some other warp issued instead.
Other Miscellaneous stall reason.
Dropped samples Samples dropped (not collected) by hardware due to backpressure or overflow.
Unknown The stall reason could not be determined. Used when CUDA kernel analysis has not been enabled (see above) or when an internal error occurred within CUPTI or Arm® MAP.
Note
Warp stalls are only reported per-kernel, so it is not possible to obtain the times within a kernel invocation at which different categories of warp stalls occurred. As function calls in CUDA kernels are automatically fully inlined it is not possible to see a stack trace of code within a kernel on the GPU.
Warp stall information is also present in the Source code editor (GPU programs), the Selected lines view (NVIDIA GPU CUDA profiles), and in a Warp stall reasons graph in the Metrics view (Metrics view).