Stable Diffusion
An image generator released with open weights and a permissive license that, within weeks, ran on consumer graphics cards in apartments around the world. It was the first generative AI model with quality comparable to closed competitors (DALL·E 2, Midjourney) that anyone could download, modify, and commercialize without permission.
How are Intelligence, Speed & Cost bucketed?
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
Stable Diffusion is the image-gen equivalent of what Llama became for text generation a year later. It proved that open-weight models could ship at frontier quality — a precedent that now shapes every conversation about LLM regulation, China competitiveness, and the viability of closed-model moats.
Core Capabilities
Context Window
Context window not disclosed.
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- First text-to-image model with DALL·E 2-class quality and permissive open weights
- Latent diffusion innovation — denoising in compressed latent space, not pixel space — made consumer-GPU inference viable
- Ran on <8GB VRAM at release — first generative model regular people could use locally
- Spawned an enormous ecosystem (LoRAs, ControlNet, AUTOMATIC1111, ComfyUI) within months
- Triggered the 'AI art' cultural moment of late 2022 along with Midjourney
- Lower default photorealism than DALL·E 2 / Midjourney v3 — but customisability completely flipped the trade-off
Best use cases
- Self-hosted image generation pipelines (privacy / volume / customisation)
- Custom-style fine-tuning via LoRA / textual inversion / Dreambooth
- ControlNet-style guided generation requiring weight access
- Research and academic experimentation on diffusion models
Tools to try
Not ideal for
- Out-of-the-box photorealistic aesthetics — Midjourney still the default for that
- Reliable in-image text rendering (FLUX.2 and later models leapfrogged here)
- Production deployments without a community-finetuned LoRA / model card