James Ding
June 6, 2025 10:02
According to the NVIDIA’s blog, NVIDIA introduces the Nemotron-H reasoning model family to provide significant throughput and various applications in reasonable tasks.
In significant development of artificial intelligence, NVIDIA has announced the Nemotron-H reasoning model family designed to improve throughput without damaging performance. This model is greatly expanded to handle inferred intensive tasks focused on mathematics and science that sometimes reach tens of thousands of tokens.
Innovation of the AI reasoning model
NVIDIA’s latest products are all provided with Nemotron-H-47B-Reasoning-128K and Nemotron-H-8B-Reasoning-128K models with FP8 quantified variations. According to NVIDIA’s blogs, this model is derived from the Nemotron-H-47B-Base-8K and Nemotron-H-H-8B-Base-8K Basic Model.
The most commonly available Nemotron-H-47B-class model in this family offers almost four times larger throughput than similar transformers, such as Llama-SUPER 49B V1.0. It supports the 128K token context and has excellent accuracy for a lot of inference. Similarly, the Nemotron-H-8B-Reasoning-128K model shows significant improvements compared to the LLAMA-SNEMOTRON NANO 8B V1.0.
Innovative function and license
The NEMOTRON-H model introduces a flexible operating function so that the user can choose between reasoning and non-class mode. This adaptability is suitable for a wide range of real applications. NVIDIA has published these models with public research licenses to encourage the research community to explore and innovate more.
Training and performance
The training of these models included the supervised micro -adjustment (SFT) along with an example containing traces of explicit reasoning. This comprehensive educational approach, which spans more than 30,000 stages for mathematics, science and coding, consistently improved internal STEM benchmarks. The follow -up education stage focuses on class, safety adjustment and dialogue, further improving the performance of the model in various tasks.
Long context processing and reinforcement learning
To support the 128K-TOKEN context, the model was trained using a synthetic sequence of up to 256k tokens, which improved the function of long-term text. In addition, the reinforcement learning through the group relative policy optimization (GRPO) is applied to improve the overall response quality of the model using purification techniques such as the following and tools.
Comparison of final results and throughput volume
The Nemotron-H-47B-Reasoning-128K model, a benchmark for models such as Llama-SUPER 49B V1.0 and QWEN3 32b, showed excellent accuracy and throughput. In particular, it achieved about four times higher throughput than the traditional transformer -based model, showing significant development of the AI model efficiency.
Overall, the Nemotron-H reasoning model represents a multipurpose and high performance foundation for applications that require precision and speed. It provides significant development of the AI reasoning function.
For more information, see the official announcement of the NVIDIA blog.
Image Source: Shutter Stock