NVIDIA unveiled the latest AI model, DeepSeek-R1, which boasts impressive 67.1 billion parameters. According to a recent NVIDIA blog post, this state -of -the -art model is now available for preview through the NVIDIA NIM micro service. DeepSeek-R1 is designed for developers to create a professional AI agent with state-of-the-art reasoning.
Unique features of Deepseek-R1
DeepSeek-R1 is an open model that provides accurate response using advanced reasoning technology. Unlike traditional models, query conducts multiple reasoning and reaches the best answers using methods such as chains and consensus. Known as a test time scaling, this process shows the importance of acceleration computing for agent AI reasoning.
The design of this model allows you to repeatedly ‘think’ through problems and create more output tokens and longer generation cycles. This extension is important for achieving high quality responses and requires a significant test time computing resource.
NIM Micro Service Improvement
The DeepSeek-R1 model now provides developers with a chance to experiment with the NVIDIA’s build platform micro service. The micro service can handle up to 3,872 tokens per second in a single NVIDIA HGX H200 system, which shows high reasoning efficiency and accuracy for tasks that require logical reasoning, reasoning and language understanding.
In order to facilitate distribution, NIM micro service supports industrial standard APIs, so companies can maximize security and data personal information from their preferred infrastructure. In addition, NVIDIA AI Foundry and NVIDIA NEMO software allows enterprise to create customized DeepSeek-R1 NIM micro service for professional AI applications.
Technology specifications and performance
DeepSeek-R1 is a MIX-OF-Experts (MOE) model featuring 256 professionals per floor, and each token is routed to eight separate experts for evaluation. The real -time performance of this model requires a large number of GPUs with significant computing functions, and it is connected through a communication system with low bandwidth to effectively route the prompt token.
The FP8 Transformer engine and NVLINK bandwidth of NVIDIA HOPPER Architecture plays an important role in achieving the high throughput of the model. This setting allows a single server with eight H200 GPUs to run the entire model efficiently to provide significant calculation performance.
Future prospect
The upcoming NVIDIA Blackwell Architecture is set to improve the test time-like for reasoning models such as Deepseek-R1. The fifth -generation tensor core improves the performance significantly, providing up to 20 petaflops peak FP4 computing performance to further optimize the reasoning work.
DeepSeek-R1 NIM developers who are interested in exploring the functions of the micro service can open a way for innovative AI solutions in various sectors on the NVIDIA’s build platform.
Image Source: Shutter Stock