CUDA Kernel analysis

CUDA kernel analysis mode is an advanced feature that provides insight into the activity within CUDA kernels. This mode can be enabled from the Run dialog or from the command line with --cuda-kernel-analysis.

Run dialog with CUDA kernel analysis enabled

When enabled, the GPU Kernels tab is enhanced to show a line-level breakdown of warp stalls. The possible categories of warp stall reasons are as listed in the enum CUpti_ActivityPCSamplingStallReason in the CUPTI API documentation.

Selected

No stall, instruction is selected for issue.

Instruction fetch

Warp is blocked because next instruction is not yet available, because of an instruction cache miss, or because of branching effects.

Execution dependency

Instruction is waiting on an arithmetic dependency.

Memory dependency

Warp is blocked because it is waiting for a memory access to complete.

Texture sub-system

Texture sub-system is fully utilized or has too many outstanding requests.

Thread or memory barrier

Warp is blocked as it is waiting at __syncthreads or at a memory barrier.

__constant__ memory

Warp is blocked waiting for __constant__ memory and immediate memory access to complete.

Pipe busy

Compute operation cannot be performed due to required resource not being available.

Memory throttle

Warp is blocked because there are too many pending memory operations.

Not selected

Warp was ready to issue, but some other warp issued instead.

Other

Miscellaneous stall reason.

Dropped samples

Samples dropped (not collected) by hardware due to backpressure or overflow.

Unknown

The stall reason could not be determined. Used when CUDA kernel analysis has not been enabled (see above) or when an internal error occurred within CUPTI or Linaro MAP.

GPU Kernels tab (with CUDA kernel analysis)

Note

Warp stalls are only reported per-kernel, so it is not possible to obtain the times within a kernel invocation at which different categories of warp stalls occurred. As function calls in CUDA kernels are automatically fully inlined it is not possible to see a stack trace of code within a kernel on the GPU.

Warp stall information is also present in the Source code editor (GPU programs), the Selected lines view (NVIDIA GPU CUDA profiles), and in a Warp stall reasons graph in the Metrics view (Metrics view).

Warp stall reasons graph