Felix Pinkston
February 13, 2025 18:01
NVIDIA’s DeepSeek-R1 model uses reasoning time scaling to improve GPU kernel generation, which efficiently manages computational resources during reasoning to optimize the performance of the AI model.
NVIDIA has introduced a new technology called reasoning time-like, which is promoted by the DeepSeek-R1 model with significant developments in the AI model efficiency. According to NVIDIA, this method is set to optimize the GPU kernel production and carefully assign calculation resources during the reasoning to improve performance.
Inference time scaling role
Interference time scaling, also known as AI reasoning or long -term thoughts, allows the AI model to evaluate multiple potential results and choose the optimal results. This approach allows more strategic and systematic solutions to complex problems by reflecting human problem solving technologies.
In the latest NVIDIA’s latest experiments, engineers used the DEEPSEEK-R1 model to automatically create a GPU kernel with increased calculation. The kernel surpassed the numerically accurate and experienced engineers created by a variety of state -of -the -art engineers without obvious programming.
Difficult to optimize the kernel
The mechanism of the central stock in the development of LLM (Lange Language Model) allows AI to selectively focus on important input segments, enabling improvement of predictions and finding hidden data patterns. However, according to the length of the input order, the demand for care operations increases secondary, so that optimized GPU kernel implementation is required to avoid runtime errors and improve computational efficiency.
Various causes, such as causality and relative position embedd, make the kernel optimization more complicated. Multimodal models, such as Vision Transformers, require specialism mechanisms for maintaining space-time information by introducing additional complexity.
Innovative Workflow using DeepSeek-R1
NVIDIA’s engineers have developed a new workflow using DeepSeek-R1 and integrated the verifier during the inference in the closed loop system. This process starts with a manual prompt, creates an initial GPU code, and then analyzes and repeatedly improves through the verification feedback.
This method has greatly improved the production of caution kernels by achieving numerical accuracy of 96%of level -1 and level 2 problems, as benchmarked by Stanford’s kernelbench.
Future prospect
The introduction of reasoning time scaling using DeepSeek-R1 show promising development in the GPU kernel creation. The initial results are encouraging, but ongoing research and development is essential for continuous results over a wide range of problems.
For developers and researchers who are interested in searching for this technology, you can use the DeepSeek-R1 NIM micro service on the NVIDIA’s build platform.
Image Source: Shutter Stock