The NVIDIA CUDA metrics are enabled if you have Arm® Forge Professional. Contact Arm support for upgrade information at: support-hpc-sw@arm.com.
Note
Accelerator metrics are not available when linking to the static Arm® MAP sampler library.
Percent of time that the GPU card was in use, that is, one or more kernels are executing on the GPU card. If multiple cards are present in a compute node this value is the mean across all the cards in a compute node. Adversely affected if CUDA kernel analysis mode is enabled.
See CUDA Kernel analysis.
Metrics summarizing CUDA memory transfers are available for CUDA 11+ programs, including heterogeneous workloads where some processes use GPUs and others do not.
Three categories of metric are available:
Byte Transfer Rate: Bytes transferred per second per process.
Memory Transfer Rate: Transfers per second per process.
Time Spent in Memory Transfers: Proportion of time in transfers per process.
Note
If a very large number of memory transfer events occur in the program, the time spent in memory transfers metric might only provide an approximation.
Different types of memory transfer can occur in the program you are profiling. For example, the program can transfer data between host memory and GPU device, or between different GPU devices on the host. Six memory transfer types are available within each category:
Sum of host-to-device, device-to-host, and peer-to-peer types (everything using PCIe or NVLink).
Selecting the category using the
mechanism displays the relevant metrics for all memory transfer types occurring within the program.The AMD ROCm metrics are enabled if you have a Arm® Forge licence with ROCm support. Contact Arm support for upgrade information at: support-hpc-sw@arm.com.
Note
Accelerator metrics are not available when linking to the static Arm® MAP sampler library.