Known issues and limitations

There are a few known issues for NCCL profiling.

  • NCCL metric collection requires NCCL version 2.27.3 or newer (see Reference table for more information). This is because prior versions of NCCL do not have the required functionality for NCCL metrics.

  • If NCCL operations occur right at the end of your program, it is possible that some NCCL events may not be captured. To help mitigate this problem, ensure that ncclCommDestroy is called before the end of the program, and before the MPI_Finalize call if MPI is being used.

  • NCCL ops using copy-engine hardware (rather than kernel execution on SMs) will not appear in the NCCL metrics.

  • If using NCCL version 2.29.2 or newer, NCCL metrics for ncclBroadcast operations using AllGatherV will not report values. To disable AllGatherV for broadcast operations, NCCL_ALLGATHERV_ENABLE=0 should be set before execution.

  • NCCL metric information may not be accurate if multiple/split NCCL communicators are used.