Memory transfers analysis

CUDA memory transfer analysis mode is an advanced feature that provides insight into the memory transfers managed via CUDA, cudaMemcpy and similar calls. When profiling programs that use CUDA 11.0 and later, this mode can be enabled from the Run dialog or from the command line with --cuda-transfer-analysis.

Run dialog with CUDA memory transfers checkbox

When enabled, the GPU Memory Transfers tab shows the locations in the code where memory transfers involving CUDA devices were initiated.

The GPU Memory Transfers tab contains a tree view of the stack traces from which GPU memory transfers were initiated. The columns are as follows:

Transfer activity:: A visual representation of when GPU memory transfers were in progress. This is an approximation of the wall-clock time in which at least one GPU transfer was active.
Bytes:: The number of bytes transferred in all GPU memory transfers started in the selected range of samples.
Time spent:: The sum of the time spent in each GPU memory transfer started in the selected range of samples. If multiple memory transfers were in progress simultaneously then this number will be larger than the actual amount of wall-clock time in which transfers were in progress.
# calls:: The total number of GPU memory transfer calls that were started in the selected range of samples.
Callsite:: The stack frames where GPU memory transfers were initiated. Expand to navigate the full stack down to the cudaMemcpy* call.
Source:: The source code line for this frame, if available. The source files must be available and the program must have been compiled with debug information enabled.
Position:: The source file and line number.

Note

Additional information can be found in the tooltip for each line, including the min/max/average bytes transferred per memory transfer call, and the min/max/average time spent in each call.