GPT-5
OpenAI stopped making users pick the right model. GPT-5 is one endpoint that routes each query to the right compute path — fast for easy questions, deep reasoning for hard ones. Shipped August 2025, it replaced the GPT-4o / o1 / o3-mini / o4-mini mess that had accumulated over the previous year. The router is the interesting technical piece; the raw capability bump is modest. 2025 年 8 月发布的 GPT-5 把 OpenAI 原本一锅乱炖的模型矩阵 (GPT-4o、4o-mini、o1、o1-pro、o3、o3-mini、o4-mini) 统统收拢到同一个入口里:用户只看到"GPT-5",背后由一个 实时路由器按问题难度智能调度——简单问题走快速模型直接回, 复杂任务自动切换到 GPT-5 Thinking 深度推理。这一层 路由器才是技术核心,不是单纯的榜单分数提升。
How are Intelligence, Speed & Cost bucketed?
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
Almost every AI-product headline from August 2025 through Q1 2026 traced back to GPT-5 — consumer chat, Copilot-everywhere, enterprise agents. People still argue whether the capability jump was a step change or just incremental. That argument misses the point. The real shift was that OpenAI stopped shipping seven models and started shipping one.
从 2025 年 8 月到 2026 年 Q1,出镜率最高的 AI 部署 背后基本都是 GPT-5——C 端 ChatGPT 所有付费档一并切到它, B 端 Microsoft Copilot 和企业 API 也是。它顺带定下了 2026 年前沿产品的默认形态:单入口、自动路由、推理默认开。
Core Capabilities
Context Window
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- Set a new ceiling on Artificial Analysis Intelligence Index (68 at High effort) at release
- 94.6% on AIME 2025 without tools — math reasoning ahead of any prior frontier model
- 74.9% SWE-bench Verified — strong but trails Claude Opus 4.5's 80.9% on coding
- 23x token-cost spread between Minimal and High reasoning effort — pick effort by task carefully
- Comparable to or better than human experts in roughly half of cases across 40 occupations
- Pro tier with extended reasoning sets state-of-the-art on GPQA at 88.4% without tools
Best use cases
- Hard math, science, and competition-style reasoning
- Multi-step research workflows where a single very-good model beats orchestration
- Replacing the o1/o3 + 4o mental switch — one model, four effort tiers
- Agent products that need both raw IQ and steerability
Tools to try
Not ideal for
- Tight latency / cost budgets — Minimal effort is GPT-4.1-class for 1/23 the cost of High
- Specialised coding agent loops — Claude Opus 4.5 still leads on real-repo SWE-bench
- Fully offline / open-weights deployments
Model Evolution
GPT is OpenAI's language model family.