Peter Jang
June 4, 2025 18:17
NVIDIA briefly describes the process of replicating the MLPERF V5.0 educational score for the LLM benchmark, emphasizing the hardware prerequisites and step execution.
NVIDIA details the process of reproducing training scores on the MLPERF V5.0 benchmark, especially LLAMA 2 70b Lora Fine-Tuning and LLAMA 3.1 405B. This initiative follows the previous announcement of NVIDIA, which achieved up to 2.6 times higher performance in MLPERF Training V5.0, as reported by SUKRU BURC Eryilmaz in the NVIDIA blog. The benchmark is part of the comprehensive evaluation of MLPERF to measure the performance of the machine learning model.
Preliminary conditions for benchmarking
To run these benchmarks, you need to meet certain hardware and software requirements. LLAMA 2 70B LORA requires NVIDIA DGX B200 or GB200 NVL72 system, but LLAMA 3.1 405b requires four GB200 NVL72 systems connected through Infini Vegetables. In addition, a real disk space is required: 300GB for LLAMA 3.1 for 2.5TB and LORA fine adjustment.
Cluster and preferences
NVIDIA uses a cluster settings managed by NVIDIA BASE Command Manager (BCM) and requires an environment based on Slurm, Pyxis and Enroot. To minimize data bottlenecks, fast local storage is recommended in RAID0. Networking must integrate NVIDIA NVLINK and Infiniband for optimal performance.
Run benchmark
The execution process starts by building a Docker container and downloading the necessary data sets and checkpoints. The benchmark runs using Slurm and the configuration file describes the hyper parameter and system settings in detail. This process is flexible and can be adjusted based on various system size and requirements.
Benchmark Log Analysis
A log containing the main MLPERF marker is created during the benchmarking process. This log provides insight into initialization, training and final accuracy. The ultimate goal is to achieve the goal evaluation loss to announce the successful completion of the benchmark.
See the NVIDIA blog for detailed guidelines, including specific scripts and configuration examples.
Image Source: Shutter Stock