Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • SUBMIT
Crypto Flexs
  • DIRECTORY
  • CRYPTO
    • ETHEREUM
    • BITCOIN
    • ALTCOIN
  • BLOCKCHAIN
  • EXCHANGE
  • TRADING
  • SUBMIT
Crypto Flexs
Home»ADOPTION NEWS»NVIDIA Releases NCCL 2.22, Offering Improved Memory Efficiency and Faster Initialization
ADOPTION NEWS

NVIDIA Releases NCCL 2.22, Offering Improved Memory Efficiency and Faster Initialization

By Crypto FlexsSeptember 21, 20243 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
NVIDIA Releases NCCL 2.22, Offering Improved Memory Efficiency and Faster Initialization
Share
Facebook Twitter LinkedIn Pinterest Email

Caroline Bishop
21 Sep 2024 13:38

NVIDIA introduces NCCL 2.22, focusing on memory efficiency, fast initialization, and cost estimation for advanced HPC and AI applications.





The NVIDIA Collective Communications Library (NCCL) has released its latest version, NCCL 2.22, which provides significant improvements aimed at optimizing memory usage, accelerating initialization times, and introducing a cost estimation API. These updates are essential for high-performance computing (HPC) and Artificial Intelligence According to the NVIDIA technology blog, (AI) applications.

Release Highlights

NVIDIA Magnum IO NCCL is designed to optimize inter-GPU and multi-node communication essential for efficient parallel computing. Key features of the NCCL 2.22 release include:

  • Delayed connection setup: This feature allows us to significantly reduce GPU memory overhead by delaying connection creation until it is needed.
  • New API for cost estimation: New APIs help optimize compute and communication redundancy or investigate NCCL cost models.
  • For optimization ncclCommInitRank: Duplicate topology queries are eliminated, resulting in up to 90% faster initialization for applications that create multiple communicators.
  • Multi-subnet support using IB routers: Added communication support for jobs spanning multiple InfiniBand subnets, enabling large-scale DL training jobs.

Detailed features

Lazy connection settings

NCCL 2.22 introduces delayed connection setup, which significantly reduces GPU memory usage by delaying connection creation until it is actually needed. This feature is especially useful for applications with narrow usage, such as repeatedly running the same algorithm. This feature is enabled by default, but can be disabled by setting it. NCCL_RUNTIME_CONNECT=0.

New Cost Model API

New API, ncclGroupSimulateEndHelps developers estimate the time required for a task, helping them optimize computation and communication redundancy. Although the estimates may not perfectly match reality, they provide useful guidance for performance tuning.

Initialization optimization

To minimize initialization overhead, the NCCL team introduced several optimizations, including delayed connection setup and intra-node topology convergence. These improvements can reduce: ncclCommInitRank Applications that create multiple communicators will run significantly faster, with execution times reduced by up to 90%.

New tuner plugin interface

The new tuner plugin interface (v3) provides a 2D cost table per set reporting the estimated time required for the task. This allows external tuners to optimize the combination of algorithms and protocols for better performance.

Static plugin linking

For convenience and to avoid loading problems, NCCL 2.22 supports static linking of network or tuner plugins. Applications can specify this by setting: NCCL_NET_PLUGIN or NCCL_TUNER_PLUGIN to STATIC_PLUGIN.

Group semantics for interruption or destruction

NCCL 2.22 introduces group semantics. ncclCommDestroy and ncclCommAbortAllows multiple communicators to be destroyed simultaneously. This feature aims to prevent deadlocks and improve user experience.

IB Router Support

This release allows NCCL to operate across multiple InfiniBand subnets, improving communications in large networks. The library automatically detects and establishes connections between endpoints across multiple subnets using FLID for higher performance and adaptive routing.

Bug fixes and minor updates

The NCCL 2.22 release also includes several bug fixes and minor updates.

  • For support allreduce Tree algorithm on DGX Google Cloud.
  • Logging NIC names for IB asynchronous errors.
  • Improved performance of registered send and receive operations.
  • Added infrastructure code for NVIDIA Trusted Computing solutions.
  • Provides separate traffic classes for IB and RoCE control messages to support advanced QoS.
  • Supports PCI peer-to-peer communication between partitioned Broadcom PCI switches.

summation

The NCCL 2.22 release introduces several important features and optimizations to improve the performance and efficiency of HPC and AI applications. Improvements include a new tuner plugin interface, support for static linking of plugins, and improved group semantics to prevent deadlocks.

Image source: Shutterstock


Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

Bitcoin is at risk of liquidation of $1.4 billion if BTC rises to $80,000.

April 28, 2026

Polymarket Seeks $400 Million Raise to $15 Billion Valuation: Report

April 20, 2026

Ether risks a $1.7K retest as traders fail to overcome a key resistance area.

April 4, 2026
Add A Comment

Comments are closed.

Recent Posts

SHRMiner Lights Up London’s Piccadilly Circus, Propelling AI Cloud Mining Into The Mainstream Spotlight

May 1, 2026

Rayls Launches Public Mainnet, Advancing Its Mission To Bring Global Finance Onchain

May 1, 2026

XRP to $10,000? Ripple CTO emeritus rejects bold claims.

May 1, 2026

How AI Is Transforming The Cryptocurrency Ecosystem

May 1, 2026

BitMart x $EAT Trade-to-Feed Competition Pays 4.4 Million USDT to Traders in May 2026

April 30, 2026

Crypto billionaire Justin Sun files suit against Trump-linked World Liberty Financial over ‘wrongly’ frozen tokens

April 30, 2026

VerifyVASP Acquires Sygna, Consolidating The Global Travel Rule Network

April 29, 2026

Dogecoin Price Analysis: Is $DOGE’s $0.10 Level a Smart Entry or a Market Trap?

April 29, 2026

How to Connect OpenClaw with Binance for Live AI Trading (2026)

April 28, 2026

BitMart X $EAT Trade-to-Feed Competition To Pay Out $4.4M USDT To Traders In May 2026

April 28, 2026

ORBS) Reports Total Holdings Of Approximately $333 Million, Includes OpenAI, Beast Industries, More Than 11,000 ETH And Over 283 Million WLD Tokens

April 28, 2026

Crypto Flexs is a Professional Cryptocurrency News Platform. Here we will provide you only interesting content, which you will like very much. We’re dedicated to providing you the best of Cryptocurrency. We hope you enjoy our Cryptocurrency News as much as we enjoy offering them to you.

Contact Us : Partner(@)Cryptoflexs.com

Top Insights

SHRMiner Lights Up London’s Piccadilly Circus, Propelling AI Cloud Mining Into The Mainstream Spotlight

May 1, 2026

Rayls Launches Public Mainnet, Advancing Its Mission To Bring Global Finance Onchain

May 1, 2026

XRP to $10,000? Ripple CTO emeritus rejects bold claims.

May 1, 2026
Most Popular

XRP price chart ‘bull flag’ targets $15 due to increasing open interest.

January 6, 2025

LiFi Protocol Loses Over $8 Million in Cyber ​​Attack

July 17, 2024

Cloudborn Demo Brings Many Surprises with Gameplay Acquired by Storm by GDC

April 4, 2024
  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms and Conditions
© 2026 Crypto Flexs

Type above and press Enter to search. Press Esc to cancel.