According to the NVIDIA blog, NVIDIA’s Blackwell platform achieved remarkable results in the MLPerf Training 4.1 industry benchmark, setting a new standard, delivering outstanding results across a variety of workloads. The platform delivers up to 2.2x higher performance per GPU on large language model (LLM) benchmarks, especially on Llama 2 70B fine-tuning and GPT-3 175B pretraining.
Leap with Blackwell
The Blackwell architecture’s initial submission to the MLCommons consortium highlighted its role in improving generative AI training performance. Key to this achievement is a new kernel that optimizes the use of Tensor Cores, the fundamental mathematical operations of many deep learning algorithms. These optimizations allow Blackwell to achieve higher compute throughput per GPU while leveraging much larger and faster high-bandwidth memory.
In particular, the platform’s efficiency is highlighted by its ability to run the GPT-3 LLM benchmark with just 64 GPUs and maintain excellent per-GPU performance. In contrast, the same task required 256 GPUs on the Hopper platform, highlighting Blackwell’s superior efficiency and capabilities.
Constant optimization
NVIDIA continues to improve its platform through continuous software development, improving performance and features for various frameworks and applications. The latest MLPerf training submission shows a 1.3x improvement in training performance on GPT-3 175B per GPU for Hopper since the introduction of the benchmark.
We also achieved large-scale results using 11,616 Hopper GPUs connected via NVIDIA NVLink and NVSwitch for high-bandwidth communication with NVIDIA Quantum-2 InfiniBand networking. This setup more than tripled in scale and performance on the GPT-3 175B benchmark compared to the previous year.
enter into partnership
NVIDIA’s success is also reflected in the contributions of its partners, including leading system manufacturers and cloud service providers such as ASUSTek, Azure, Cisco, Dell, and Fujitsu, who have submitted impressive results to MLPerf. As a founding member of MLCommons, NVIDIA emphasizes the importance of industry-standard benchmarks in AI computing and provides critical data for enterprises to make informed platform investment decisions.
Through continuous advancement and optimization, NVIDIA’s accelerated computing platform sets a new standard in AI training, delivering improved performance and greater return on investment for both partners and customers.
Image source: Shutterstock