Memory transfers analysis
CUDA memory transfer analysis mode is an advanced feature that provides insight
into the memory transfers managed via CUDA, cudaMemcpy
and similar calls.
When profiling programs that use CUDA 11.0 and later, this mode can be enabled
from the Run dialog or from the command line with --cuda-transfer-analysis
.

When enabled, the GPU Memory Transfers tab shows the locations in the code where memory transfers involving CUDA devices were initiated.

The GPU Memory Transfers tab contains a tree view of the stack traces from which GPU memory transfers were initiated. The columns are as follows:
- Transfer activity:
A visual representation of when GPU memory transfers were in progress. This is an approximation of the wall-clock time in which at least one GPU transfer was active.
- Bytes:
The number of bytes transferred in all GPU memory transfers started in the selected range of samples.
- Time spent:
The sum of the time spent in each GPU memory transfer started in the selected range of samples. If multiple memory transfers were in progress simultaneously then this number will be larger than the actual amount of wall-clock time in which transfers were in progress.
- # calls:
The total number of GPU memory transfer calls that were started in the selected range of samples.
- Callsite:
The stack frames where GPU memory transfers were initiated. Expand to navigate the full stack down to the
cudaMemcpy*
call.- Source:
The source code line for this frame, if available. The source files must be available and the program must have been compiled with debug information enabled.
- Position:
The source file and line number.
Note
Additional information can be found in the tooltip for each line, including the min/max/average bytes transferred per memory transfer call, and the min/max/average time spent in each call.