Techniques
Plain-English explainers for the architectures, training methods, and inference techniques behind the models in this tree. Built to help you read a model page and actually know what its specs mean.
Architectures 3
Diffusion Models
A generative method that learns to reverse a noising process. Dominant for image, video, and audio generation since 2022.
Mixture-of-Experts (MoE)
A Transformer variant that routes each token to a small subset of expert sub-networks. Decouples total parameters from per-token compute.
Transformer
A neural network architecture built on self-attention. Replaced recurrent and convolutional sequence models for almost every task by 2020.