Artificial intelligence (AI) image creation technology is rapidly accelerating in a variety of ways. Recent developments have moved the industry from steady progress to constant innovation, and now promise the advent of real-time, high-fidelity image creation.
These tools aren’t slow. Waiting one minute to “get more done” isn’t that long of a time. But users still demand more: more realism, versatility, variety, and speed. And for the latter, researchers are happy to provide it.
SDXL presses the accelerator pedal.
Stability AI has unveiled SDXL Turbo, which could represent a monumental leap forward in AI image creation. We don’t say this lightly: a recently released model can generate an image in less than a second instead of the 30 to 60 seconds it takes a typical generator. It’s almost as close to real-time AI image generation, if not as effective.
The SDXL Turbo is different from all previous Stable Diffusion models. Adversarial Diffusion Distillation (ADD) technology can significantly reduce the number of steps required to produce high-quality images. The steps that a typical image could take from 30 to 100 steps can be reduced to just one. “ADD is the first method to unlock single-step real-time image synthesis using fundamental models,” Stability AI claims in a research paper.
SDXL Turbo combines adversarial training and score extraction to optimize the generation process and ensure images are generated quickly while maintaining high fidelity.
As a result, SDXL Turbo allows you to create complex, high-resolution images almost instantly. This new approach also sparks interest in GANs, which had been largely forgotten since diffusion techniques began to dominate the field.
The latent consistency model implies efficiency.
But if you don’t want to say goodbye to the “old” stable diffusion model, researchers have a solution.
Advances in SDXL Turbo include Latent Consistency Models (LCM) and LCM-LoRA, each contributing uniquely to the field.
The LCM presented in a dedicated research paper stands out for its ability to operate efficiently within the latent space of pre-trained autoencoders such as Stable Diffusion to produce high-resolution images. LCM aims to speed up image creation without sacrificing quality by focusing on high-resolution output. LCM utilizes a one-step directed distillation method to transform a pre-trained diffusion model into a fast image generator, skipping unnecessary steps.
In fact, users don’t need to change anything else. Simply download the model and use it as a regular SDXL checkpoint. But instead of going through a huge number of steps, you can reduce the gauge to a minimum. Instead of calculating 25, 50, or 75 steps of generation per image, the model generates a good image in 4 steps in seconds.
There are already great models with their own version of LCM that you can try. We recommend Hephaistos_NextGENXL for versatility, but there are many great models you can test.
LCM-LoRAS: Turbocharger for all models
LCM-LoRA, released together with LCM, provides a general-purpose acceleration module that can be integrated into a variety of stable diffusion models. “LCM-LoRA can be viewed as a plug-in neural PF-ODE solver with strong generalization capabilities,” the research paper states.
LCM-LoRA is designed to increase the efficiency of the existing Stable Diffusion model, making it faster and more versatile. We use Low-Rank Adaptation (LoRA) to update pre-trained weight matrices to reduce computational load and memory requirements.
Using LCM-LoRA significantly improves the image generation speed of generic Stable Diffusion models, making it very effective for a variety of tasks. Users don’t even need to download new models. Simply enable LCM LoRA and create images as fast as LCM mode.
You can download LCM-LoRA for SD 1.5 and SDXL here.
Quality vs. Speed
Despite these technological leaps, the need to balance speed and image quality remains. Fast creation tools like SDXL Turbo and LCM-LoRA accelerate the creative process but sacrifice some image fidelity. This means that an image generated with 50 levels and a good model will always have higher resolution or image fidelity than an image generated with 5 levels and a good LCM model.
However, this trade-off is mitigated by the utility of a typical workflow where numerous images are generated in search of the perfect image. Subsequent iterations using tools such as image-to-image or inpaint can compensate for the loss of initial quality by enhancing the detail of these first cut images. A properly edited image generated by one of these fast techniques can be as good as an image generated by a regular Stable Diffusion model.
The AI image creation space is in overdrive and few people are more hungry for speed than AI fanboys, so buckle up.
Stay up to date with cryptocurrency news and receive daily updates in your inbox.