Techniques

Plain-English explainers for the architectures, training methods, and inference techniques behind the models in this tree. Built to help you read a model page and actually know what its specs mean.

Architectures 3

Diffusion Models

A generative method that learns to reverse a noising process. Dominant for image, video, and audio generation since 2022.

Mixture-of-Experts (MoE)

A Transformer variant that routes each token to a small subset of expert sub-networks. Decouples total parameters from per-token compute.

Transformer

A neural network architecture built on self-attention. Replaced recurrent and convolutional sequence models for almost every task by 2020.

Training 1

RLHF (Reinforcement Learning from Human Feedback)

Post-training method that adjusts a model's behavior using human preference data. The technique that converted GPT-3 into ChatGPT.

Inference & reasoning 1

Chain-of-Thought (CoT)

Generating intermediate reasoning steps as text before the final answer. Improves accuracy on multi-step problems; foundation of modern reasoning models.