LANGUAGE MODEL Jun 2018 OpenAI

GPT-1

Generative Pre-Training

OpenAI's first GPT — a transformer trained to predict the next word on a large book dataset, then fine-tuned on specific tasks. It demonstrated that "pre-train once, fine-tune cheaply" worked for language, mirroring what AlexNet did for vision in 2012.

Try ChatGPT API Docs ↗

Context

512

Up to 512 tokens

Official ↗ GitHub ↗

Why it matters

Without GPT-1, nobody would have funded GPT-2; without GPT-2's "too dangerous to release" controversy, GPT-3's investor narrative would have been weaker. This is the unglamorous root of OpenAI's LLM lineage.

Core Capabilities

Long Documents

Handles entire codebases, books, and multi-doc RAG.

Generative

Produces images, video, audio, or other media.

Research

Foundational paper or scientific contribution.

Context Window

512 tokens

short prompt

4k Chat 聊天

32k Long docs 长文档

128k Books 整本书

400k Multi-doc 多文档

1M Codebase 整个代码库

10M

512

Availability

API

Not available

Product / App

Not available

Open Source

Not released

Enterprise

—

Pricing Model

Research artifact

Not commercially released.

Research

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Context / memory

Context window size · log-scaled

6.0

9.0

0.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

Original GPT-1 — historical; foundational paper.
Step-change in reasoning vs GPT-3.5 — top 10% on simulated bar exam vs 3.5's bottom 10%
MMLU 86.4% in English; surpassed prior models in 24 of 26 other languages
First widely-deployed model with image input (text output only) — multimodal era starts here

Reviews: OpenAI — GPT-4 research page ↗ · GPT-4 Technical Report (arXiv) ↗ · MIT Press — How is ChatGPT's behavior changing ↗

Best use cases

Professional knowledge work needing top-of-class reasoning at the time
Code generation with chain-of-thought prompting
Multilingual tasks across 26+ languages

Tools to try

ChatGPT Codex CLI Cursor GitHub Copilot Continue.dev

Not ideal for

Frontier work after Claude 3.5+ / GPT-4o / Llama 3 — quickly surpassed in 2024
Cost-sensitive bulk inference — pricing dominant before GPT-4o cut it in half

Model Evolution

View full evolution tree →

Radford, A. · Narasimhan, K. · Salimans, T. · Sutskever, I.