Jog
March 14, 2025 02:22
NVIDIA’s latest NCCL 2.24 release introduces new features for deep learning education by improving multi -GPUs and multi -node communications, including RAS sub -systems, NIC Fusion and FP8 support.
NVIDIA Collective Communications Library (NCCL) introduced the latest version of 2.24 to greatly develop networking reliability and observation for MGMN (Multi-GPU and Multinode) communication. As reported in the NVIDIA Developer Blog, this release is specially optimized for NVIDIA GPUs and networking, and is an essential component for multi -GPU deep learning training.
NCCL 2.24 New Functions
The update contains some new features aimed at improving performance and reliability.
- Reliability, availability and service potential (RAS) sub -system
- Registered user buffer (UB) for multi -node groups
- NIC Fusion
- Optional receipt completion
- FP8 support
- Strict
NCCL_ALGO
andNCCL_PROTO
RAS sub system
The RAS sub system is one of the noticeable additional features of NCCL 2.24. Designed to help users diagnose application problems such as collisions and hanging types, especially in large distribution. This low weight infrastructure provides a worldwide view of the running application, which can detect abnormalities such as nodes or delayed processes that do not respond. It works by creating a thread network in an NCCL process that monitors each other’s health through regular maintenance messages.
Improvement of user buffer registration
NCCL 2.24 introduces user buffer (UB) registration for multi -node population to reduce more efficient data transmission and GPU resources consumption. This library now supports UB registration for the Node per node per node group networking and standard peer -to -peer network, especially for tasks such as Allgather and Broadcast.
NIC Fusion
With the expansion of many NIC systems, the NCCL has been adjusted to optimize network communication. The new NIC FUSION feature allows you to logically merge multiple NICs with a single entity to ensure efficient use of network resources. This feature is particularly advantageous for a system with one or more NICs per GPU and solves problems such as collisions and inefficient resource allocation.
Additional function and modification
This update can also reduce overhead and congestion by introducing optional receiving completion of LL and LL128 protocols. NCCL 2.24 supports the decrease in the default FP8 of NVIDIA Hopper and new architecture, improving processing functions. Also more strict NCCL_ALGO
and NCCL_PROTO
Implemented to ensure more accurate tuning and error processing for users.
This update includes a variety of bug modifications and minor improvements, such as PAT tuning adjustment and improved memory allocation functions, improving the overall rigidity and efficiency of the NCCL library.
Image Source: Shutter Stock