GPT-1
Generative Pre-Training
OpenAI's first GPT — a transformer trained to predict the next word on a large book dataset, then fine-tuned on specific tasks. It demonstrated that "pre-train once, fine-tune cheaply" worked for language, mirroring what AlexNet did for vision in 2012.
Context
512
Up to 512 tokens
Why it matters
Without GPT-1, nobody would have funded GPT-2; without GPT-2's "too dangerous to release" controversy, GPT-3's investor narrative would have been weaker. This is the unglamorous root of OpenAI's LLM lineage.
Core Capabilities
Long Documents
Handles entire codebases, books, and multi-doc RAG.
Generative
Produces images, video, audio, or other media.
Research
Foundational paper or scientific contribution.
Context Window
512 tokens
short prompt
4k Chat 聊天
32k Long docs 长文档
128k Books 整本书
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
512
Availability
API
Not available
Product / App
Not available
Open Source
Not released
Enterprise
—
Pricing Model
Research artifact
Not commercially released.
Research Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
Lower 20% Upper 80% This model
Context / memory
Context window size · log-scaled
6.0
9.0
0.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this
Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.
What it feels like
- Original GPT-1 — historical; foundational paper.
- Step-change in reasoning vs GPT-3.5 — top 10% on simulated bar exam vs 3.5's bottom 10%
- MMLU 86.4% in English; surpassed prior models in 24 of 26 other languages
- First widely-deployed model with image input (text output only) — multimodal era starts here
Best use cases
- Professional knowledge work needing top-of-class reasoning at the time
- Code generation with chain-of-thought prompting
- Multilingual tasks across 26+ languages
Tools to try
Not ideal for
- Frontier work after Claude 3.5+ / GPT-4o / Llama 3 — quickly surpassed in 2024
- Cost-sensitive bulk inference — pricing dominant before GPT-4o cut it in half