Jessie A. Ellis
January 27, 2026 19:22
NVIDIA has released FastGen, an open source library that accelerates diffusion models by up to 100x. A 14B parameter video model is now trained in 16 hours on 64 H100 GPUs.
On January 27, NVIDIA released FastGen, an open source library that promises to reduce diffusion model inference times by 10x to 100x. This toolkit targets serious bottlenecks in generative AI. That is, ensuring that these models produce output fast enough for real-world use.
Standard diffusion models require tens to hundreds of denoising steps per generation. As for the images, it’s annoying. For video? It’s a deal breaker. Generating a single video clip can take minutes to hours, making real-time applications virtually impossible.
FastGen attacks this through distillation. Basically, you train a smaller, faster model to mimic the output of a slower, more accurate model. The library bundles trajectory-based approaches (such as OpenAI’s iCT and MIT’s MeanFlow) and distribution-based methods (Stability AI’s LADD and Adobe’s DMD) under one roof.
important numbers
The NVIDIA team distilled the 14 billion-parameter Wan2.1 text-to-video model into a few-step generator. Training time: 16 hours on 64 H100 GPUs. The distilled model runs 50x faster than the teacher while maintaining similar visual quality.
On standard benchmarks, FastGen’s implementation matched or beat the results of the original research paper. The DMD2 implementation scored 1.99 FID on CIFAR-10 (reported as 2.13 in the paper) and 1.12 on ImageNet-64 compared to the original 1.28.
Weather modeling has also been improved. NVIDIA’s CorrDiff latency reduction model, refined with FastGen, now runs 23x faster while matching the original prediction accuracy.
Why this matters to developers
The plug-and-play architecture is a real selling point. Developers take a diffusion model, choose a distillation method, and FastGen handles the transformation pipeline. There’s no need to rewrite your training infrastructure or navigate incompatible codebases.
Supported optimizations include FSDP2, automatic mixed precision, context parallelism, and efficient KV cache management. This library works with NVIDIA’s Cosmos-Predict2.5, Wan2.1, and Wan2.2 and extends to non-vision applications.
Interactive world models, systems that simulate environments that react in real time to user actions, are receiving particular attention. FastGen implements causal extraction methods such as CausVid and Self-Forcing to transform interactive video models into autoregressive generators suitable for real-time interaction.
competitive situation
This release comes as diffusion model research continues to explode across the industry. The literature has seen exponential growth in the past year in application areas encompassing image generation, video compositing, 3D asset creation, and scientific simulations. NVIDIA also announced its Earth-2 suite of open weather models on January 26, signaling its broader AI infrastructure ambitions.
FastGen is now available on GitHub. The real test is whether third-party developers can actually achieve 100x speedups on their own models, or whether the gains are limited to NVIDIA’s carefully optimized examples.
Image source: Shutterstock
