LANGUAGE MODEL OpenAI

GPT-1

Generative Pre-Training

OpenAI's first GPT — a transformer trained to predict the next word on a large book dataset, then fine-tuned on specific tasks. It demonstrated that "pre-train once, fine-tune cheaply" worked for language, mirroring what AlexNet did for vision in 2012.

Context
512
Up to 512 tokens

Why it matters

Without GPT-1, nobody would have funded GPT-2; without GPT-2's "too dangerous to release" controversy, GPT-3's investor narrative would have been weaker. This is the unglamorous root of OpenAI's LLM lineage.

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Generative
Produces images, video, audio, or other media.
Research
Foundational paper or scientific contribution.

Context Window

512 tokens
short prompt
4k Chat 聊天
32k Long docs 长文档
128k Books 整本书
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
512

Availability

API
Not available
Product / App
Not available
Open Source
Not released
Enterprise

Pricing Model

Research artifact
Not commercially released.
Research

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Context / memory
Context window size · log-scaled
6.0
9.0
0.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • Original GPT-1 — historical; foundational paper.
  • Step-change in reasoning vs GPT-3.5 — top 10% on simulated bar exam vs 3.5's bottom 10%
  • MMLU 86.4% in English; surpassed prior models in 24 of 26 other languages
  • First widely-deployed model with image input (text output only) — multimodal era starts here

Best use cases

  • Professional knowledge work needing top-of-class reasoning at the time
  • Code generation with chain-of-thought prompting
  • Multilingual tasks across 26+ languages

Tools to try

Not ideal for

  • Frontier work after Claude 3.5+ / GPT-4o / Llama 3 — quickly surpassed in 2024
  • Cost-sensitive bulk inference — pricing dominant before GPT-4o cut it in half

Model Evolution

View full evolution tree →

Radford, A. · Narasimhan, K. · Salimans, T. · Sutskever, I.