SDXL Turbo: The Real-Time AI Image-Synthesis Model by Stability AI

Stability AI, a leading company in the field of AI image synthesis, has announced the launch of a new model called Stable Diffusion XL Turbo, or SDXL Turbo for short. This model can generate realistic images based on a written prompt or a source image in a fraction of the time required by previous models. The company claims that Stable Diffusion XL Turbo can achieve “real-time” image generation, making it a potential game-changer for various applications.

How SDXL Turbo works

SDXL Turbo is based on the Stable Diffusion XL (SDXL) model, which uses a diffusion process to gradually transform a noise image into a target image. However, unlike SDXL, which requires 20–50 steps to produce a high-quality image, SDXL Turbo can do it in a single step. This is possible thanks to a novel technique called Adversarial Diffusion Distillation (ADD).

ADD combines two methods to improve the efficiency and realism of the model. The first method is score distillation, which means that the model learns from existing image-synthesis models, such as SDXL and BigGAN. The second method is adversarial loss, which means that the model learns to distinguish between real and generated images, and tries to fool a discriminator network.

Stability AI has published a research paper that explains the details of the ADD technique and its advantages over other methods. The paper also shows that Stable Diffusion XL Turbo is similar to Generative Adversarial Networks (GANs), which are widely used for image synthesis, especially in terms of producing single-step image outputs.

How it performs

SDXL Turbo is not meant to replace SDXL, but rather to complement it. Stable Diffusion XL Turbo images are not as detailed as SDXL images produced at higher step counts, but they are much faster to generate. For some applications, such as interactive image editing or video generation, speed may be more important than detail.

To test the speed of SDXL Turbo, we ran it locally on an Nvidia RTX 3060 using Automatic1111, a tool that allows easy access to SDXL weights. We found that Stable Diffusion XL Turbo can generate a 3-step 1024×1024 image in about 4 seconds, compared to 26.4 seconds for a 20-step SDXL image with similar detail. Smaller images generate even faster (under one second for 512×768).

Of course, a more powerful graphics card, such as an RTX 3090 or 4090, would enable faster generation times as well. We also noticed that SDXL Turbo images have the best detail at around 3–5 steps per image, contrary to Stability’s marketing.

How SDXL Turbo is available

Currently, SDXL Turbo is only available under a non-commercial research license, which limits its use to personal, non-commercial purposes. This has caused some criticism in the Stable Diffusion community, which has been eagerly waiting for a commercial license. However, Stability AI has stated that it is open to commercial applications and invites interested parties to contact them for more information. Researchers and developers can access the SDXL api to integrate SDXL Turbo into their projects, facilitating further experimentation and innovation.

Meanwhile, Stability AI itself has been facing some internal management issues, with an investor recently calling for the resignation of CEO Emad Mostaque. The company has also been reportedly exploring a possible sale to a larger entity, but that has not affected its pace of innovation. Just last week, the company announced Stable Video Diffusion, a model that can turn still images into short video clips.

SDXL Turbo is a remarkable achievement in the field of AI image synthesis and demonstrates the potential of Stability AI’s technology. We look forward to seeing more applications and developments from this company in the future.

Detect AI-Generated Images with Sony Smartphone Cameras Easily