Stable Diffusion 3: A New AI Image Generator by Stability AI

Stability AI, a company known for its open and innovative image-synthesis models, has unveiled its latest creation: Stable Diffusion 3 (SD3). This is a next-gen AI image generator that can produce realistic and detailed images from text descriptions, also known as prompts. Stability claims that SD3 is a significant improvement over its previous models in terms of quality, accuracy, and scalability.

What is Stable Diffusion 3 and how does it work?

Stable Diffusion 3 is a family of models ranging from 800 million to 8 billion parameters. Parameters are the numbers that determine how the model behaves and what it can generate. The larger the model, the more detail and complexity it can handle. However, larger models also need more computing power and memory to run.

Stability says that Stable Diffusion 3 can run locally on various devices, from smartphones to servers, depending on the model size. This gives users more flexibility and control over their image generation. Users can also fine-tune the models to change their outputs according to their preferences and needs.

Stable Diffusion 3 uses a novel architecture called diffusion transformer, inspired by transformers, a type of AI model that is good at processing sequences and patterns. Unlike conventional image-synthesis models that use image-building blocks (such as U-Net), diffusion transformer works on small pieces of the image and gradually transforms them from random noise to a coherent picture.

Stable Diffusion 3 also employs a technique called flow matching, which helps the model learn how to smoothly transition from noise to image without simulating every intermediate step. Instead, the model focuses on the overall direction or flow of the image creation.

Stable diffusion 3
Image Source: Stability AI

How does SD3 compare to other image-synthesis models?

Stable Diffusion 3 is not the first image-synthesis model that Stability has developed. Since 2022, the company has launched a series of models, such as Stable Diffusion 1.4, 1.5, 2.0, 2.1, XL, and XL Turbo. These models have been open-weight and source-available, meaning that anyone can download and run them for free. Stability has positioned itself as a more open alternative to proprietary models, such as OpenAI’s DALL-E 3, which is only accessible through an API.

However, Stability’s models have also faced some challenges and controversies, such as the use of copyrighted training data, the presence of bias, and the risk of abuse. These issues have led to legal disputes that are still ongoing.

We have not tested Stable Diffusion 3 ourselves, but based on the samples that Stability has shared on its website and social media, the model seems to generate high-quality images that are comparable to other state-of-the-art models, such as DALL-E 3, Adobe Firefly, Imagine with Meta AI, Midjourney, and Google Imagen. To facilitate access and integration, Stability AI offers a Stable Diffusion API, allowing developers to harness the power of SD3 in their own applications and services.

One of the notable features of SD3 is its ability to generate accurate and legible text within the images, which was a weakness of earlier models. Another feature is its prompt fidelity, which means how well it follows the text descriptions given by the users. From the examples we have seen, SD3 appears to have similar prompt fidelity to DALL-E 3, but we cannot confirm that without further testing.

How can you try SD3?

Stability has not released Stable Diffusion 3 to the public yet, but it has opened up a waitlist for those who are interested in trying it. You can sign up on their website and wait for an invitation. Stability says that it will release the weights of SD3 for free once the testing phase is over. The company says that this phase is important for gathering feedback and improving the performance and safety of the model before making it widely available.

Stability is also experimenting with other image-synthesis architectures, such as Stable Cascade, which uses a three-stage process for text-to-image synthesis. The company says it constantly explores new ways to create realistic and diverse images with AI.