IMAGE MODEL Aug 2022 Stability AI

Stable Diffusion

Open-Weight Latent Diffusion for Images

An image generator released with open weights and a permissive license that, within weeks, ran on consumer graphics cards in apartments around the world. It was the first generative AI model with quality comparable to closed competitors (DALL·E 2, Midjourney) that anyone could download, modify, and commercialize without permission.

Try demo API Docs ↗

Cost

Free

Open weights — self-host

How are Intelligence, Speed & Cost bucketed?

Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).

Intelligence

Top 1%≤ 1%
Top 5%≤ 5%
Top 10%≤ 10%
Good≤ 25%
Medium≤ 50%
Below avg> 50%

Speed

Top 1%≥ 345 tok/s
Top 5%≥ 237 tok/s
Top 10%≥ 196 tok/s
Good≥ 146 tok/s
Medium≥ 90 tok/s
Slow< 90 tok/s

Cost

Freeopen weights · self-host
Low< $1 / M out
Moderate$1–5 / M out
High≥ $5 / M out

Official ↗ GitHub ↗

Why it matters

Stable Diffusion is the image-gen equivalent of what Llama became for text generation a year later. It proved that open-weight models could ship at frontier quality — a precedent that now shapes every conversation about LLM regulation, China competitiveness, and the viability of closed-model moats.

Core Capabilities

Generative

Produces images, video, audio, or other media.

Vision

Understands images, scenes, and visual context.

Multimodal

Combines text, vision, and audio in one model.

Context Window

Context window not disclosed.

Availability

API

Not available

Product / App

Available

Open Source

Released

Enterprise

Contact sales

Pricing Model

Free / self-host

Open weights — pay only for compute.

Self-host

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Quality

No data reported · placeholder

5.0

Speed

No data reported · placeholder

5.0

Control

No data reported · placeholder

5.0

Consistency

No data reported · placeholder

5.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

First text-to-image model with DALL·E 2-class quality and permissive open weights
Latent diffusion innovation — denoising in compressed latent space, not pixel space — made consumer-GPU inference viable
Ran on <8GB VRAM at release — first generative model regular people could use locally
Spawned an enormous ecosystem (LoRAs, ControlNet, AUTOMATIC1111, ComfyUI) within months
Triggered the 'AI art' cultural moment of late 2022 along with Midjourney
Lower default photorealism than DALL·E 2 / Midjourney v3 — but customisability completely flipped the trade-off

Reviews: Stability AI — Stable Diffusion public release ↗ · GitHub — CompVis/stable-diffusion ↗ · BentoML — Best open-source image generation models ↗

Best use cases

Self-hosted image generation pipelines (privacy / volume / customisation)
Custom-style fine-tuning via LoRA / textual inversion / Dreambooth
ControlNet-style guided generation requiring weight access
Research and academic experimentation on diffusion models

Tools to try

Stability Platform Hugging Face ComfyUI

Not ideal for

Out-of-the-box photorealistic aesthetics — Midjourney still the default for that
Reliable in-image text rendering (FLUX.2 and later models leapfrogged here)
Production deployments without a community-finetuned LoRA / model card

Model Evolution

View full evolution tree →

Rombach, R. · Blattmann, A. · Lorenz, D. · Esser, P. · Ommer, B.