NCCL profiling
You can use the NCCL profiling capabilities when working with NCCL programs. NCCL profiling is initialized by default with a Linaro Forge license with CUDA support. Contact Forge Support for upgrade information.
Linaro MAP contains a number of metrics which break down NCCL communication, helping to aid understanding of communication bottlenecks. See Accelerator metrics for more information.
In addition, executed NCCL kernels will appear in the GPU Kernels tab.