InstructGPT
Training LMs to Follow Instructions with Human Feedback
The training recipe — supervised fine-tuning followed by reinforcement learning from human feedback — that turned GPT-3 from a raw text completer into a model that actually follows instructions. Human raters preferred the 1.3B InstructGPT model over the 175B raw GPT-3 model, suggesting that alignment can matter more than scale for user-facing tasks.
Context
2K
Up to 2,048 tokens
Why it matters
InstructGPT, not GPT-3, is what people interact with when they use a "language model" today. The recipe in this paper is why the chat box on chat.openai.com is helpful instead of a chaotic autocomplete.
Core Capabilities
Long Documents
Handles entire codebases, books, and multi-doc RAG.
Research
Foundational paper or scientific contribution.
Context Window
2k tokens
short prompt
4k Chat 聊天
32k Long docs 长文档
128k Books 整本书
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
2k
Availability
API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales
Pricing Model
Pay per token
Input and output billed separately.
Pay-per-token Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
Lower 20% Upper 80% This model
Context / memory
Context window size · log-scaled
6.0
9.0
0.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this
Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.
What it feels like
- Introduced RLHF (reinforcement learning from human feedback) at scale — recipe behind ChatGPT, Claude, Gemini
- 1.3B InstructGPT preferred over 175B GPT-3 on user prompts — alignment beat scale
- Three-step training: SFT on demos, reward model from comparisons, PPO against reward — still the standard template
- Direct precursor of ChatGPT (Nov 2022): same recipe with more conversational tuning
- TruthfulQA improvement of +10 to +25 points over base GPT-3 — measurable hallucination reduction
- Most cited alignment paper of the GPT era; sparked Constitutional AI and DPO follow-ups
Best use cases
- Foundational reading on RLHF and modern instruction-tuning
- Citation in any work on alignment, preference modelling, or reward modelling
- Understanding why ChatGPT 'feels different' from raw GPT-3
Tools to try
Not ideal for
- Direct production use — long superseded by ChatGPT and successors
- Self-hosted / open-weights workflows (closed API only)