CUDA Kernel analysis

CUDA kernel analysis mode is an advanced feature that provides insight into the activity within CUDA kernels. This mode can be enabled from the Run dialog or from the command line with --cuda-kernel-analysis.

When enabled, the GPU Kernels tab is enhanced to show a line-level breakdown of warp stalls. The possible categories of warp stall reasons are as listed in the enum CUpti_ActivityPCSamplingStallReason in the CUPTI API documentation.

Selected: No stall, instruction is selected for issue.
Instruction fetch: Warp is blocked because next instruction is not yet available, because of an instruction cache miss, or because of branching effects.
Execution dependency: Instruction is waiting on an arithmetic dependency.
Memory dependency: Warp is blocked because it is waiting for a memory access to complete.
Texture sub-system: Texture sub-system is fully utilized or has too many outstanding requests.
Thread or memory barrier: Warp is blocked as it is waiting at __syncthreads or at a memory barrier.
__constant__ memory: Warp is blocked waiting for __constant__ memory and immediate memory access to complete.
Pipe busy: Compute operation cannot be performed due to required resource not being available.
Memory throttle: Warp is blocked because there are too many pending memory operations.
Not selected: Warp was ready to issue, but some other warp issued instead.
Other: Miscellaneous stall reason.
Dropped samples: Samples dropped (not collected) by hardware due to backpressure or overflow.
Unknown: The stall reason could not be determined. Used when CUDA kernel analysis has not been enabled (see above) or when an internal error occurred within CUPTI or Linaro MAP.

GPU Kernels tab (with CUDA kernel analysis)

Note

Warp stalls are only reported per-kernel, so it is not possible to obtain the times within a kernel invocation at which different categories of warp stalls occurred. As function calls in CUDA kernels are automatically fully inlined it is not possible to see a stack trace of code within a kernel on the GPU.

Warp stall information is also present in the Source code editor (GPU programs), the Selected lines view (NVIDIA GPU CUDA profiles), and in a Warp stall reasons graph in the Metrics view (Metrics view).