CUDA Kernel analysis
CUDA kernel analysis mode is an advanced feature that provides insight
into the activity within CUDA kernels. This mode can be enabled from the
Run dialog or from the command line with --cuda-kernel-analysis
.

When enabled, the GPU Kernels tab is enhanced to show a line-level
breakdown of warp stalls. The possible categories of warp stall reasons
are as listed in the enum CUpti_ActivityPCSamplingStallReason
in the
CUPTI API documentation.
- Selected
No stall, instruction is selected for issue.
- Instruction fetch
Warp is blocked because next instruction is not yet available, because of an instruction cache miss, or because of branching effects.
- Execution dependency
Instruction is waiting on an arithmetic dependency.
- Memory dependency
Warp is blocked because it is waiting for a memory access to complete.
- Texture sub-system
Texture sub-system is fully utilized or has too many outstanding requests.
- Thread or memory barrier
Warp is blocked as it is waiting at
__syncthreads
or at a memory barrier.- __constant__ memory
Warp is blocked waiting for
__constant__ memory
and immediate memory access to complete.- Pipe busy
Compute operation cannot be performed due to required resource not being available.
- Memory throttle
Warp is blocked because there are too many pending memory operations.
- Not selected
Warp was ready to issue, but some other warp issued instead.
- Other
Miscellaneous stall reason.
- Dropped samples
Samples dropped (not collected) by hardware due to backpressure or overflow.
- Unknown
The stall reason could not be determined. Used when CUDA kernel analysis has not been enabled (see above) or when an internal error occurred within CUPTI or Linaro MAP.

Note
Warp stalls are only reported per-kernel, so it is not possible to obtain the times within a kernel invocation at which different categories of warp stalls occurred. As function calls in CUDA kernels are automatically fully inlined it is not possible to see a stack trace of code within a kernel on the GPU.
Warp stall information is also present in the Source code editor (GPU programs), the Selected lines view (NVIDIA GPU CUDA profiles), and in a Warp stall reasons graph in the Metrics view (Metrics view).
