Louisa Crawford
June 4, 2025 17:51
NVIDIA’s BLACKWELL Architecture shows significant performance improvements of MLPERF Training V5.0 and offers up to 2.6 times faster training time on various benchmarks.
NVIDIA’s latest Blackwell Architecture showed significant progress in the artificial intelligence area and showed up to 2.6 times the performance of the MLPERF training V5.0 benchmark. According to NVIDIA, this achievement emphasizes the difficult fields of architectural development that Blackwell brings to the table, especially LLMS (Large Language Models) and other AI applications.
Blackwell’s architectural innovation
Blackwell introduces some improvements compared to the predecessor Hopper Architecture. This includes the 5th generation NVLINK and NVLINK switch technology and greatly improves bandwidth between GPUs. This improvement is important for reducing training time and increasing throughput. Blackwell’s second -generation transformer engine and HBM3E memory contribute to faster and more efficient model training.
With this development, NVIDIA’s GB200 NVL72 system has gained surprising results such as training an LLAMA 3.1 405B model, which is more faster than HOPPER architecture. This system can reach up to 1,960 educational processes.
Performance of the benchmark
The MLPERF Training V5.0, known as a strict benchmark, includes tests of various domains such as LLM pre-adjustment, text-image creation and graph neural networks. NVIDIA’s platforms are excellent in seven benchmarks, showing their ability in both speed and efficiency.
For example, in the LLM fine adjustment using the LLAMA 2 70B model, the BLACKWELL GPU achieved 2.5 times speed compared to the previous submission using the DGX H100 system. Similarly, stable spread v2 pre -ointment benchmarks have increased 2.6 times per GPU, setting new performance records to scale.
Implications and future prospects
Performance improvement not only emphasizes the function of the Blackwell Architecture, but also opened a way for faster deployment of the AI model. Faster education and fine adjustment means that organizations can improve their competitive advantage in order to market AI applications faster.
Focusing on optimizing NVIDIA’s continuous software stack, including libraries such as CUBLAS and CUDNN, plays an important role in these performance benefits. This optimization especially facilitates using the enhanced computational capacity of Blackwell, especially in the AI data format.
With this development, NVIDIA is ready to develop leadership in AI hardware to provide solutions that meet more and more demands of large AI models.
For more information about NVIDIA’s performance in MLPERF Training V5.0, visit the NVIDIA blog.
Image Source: Shutter Stock