Claude 3.7 Sonnet
The first Claude model that can switch between fast standard responses and slow "extended thinking" mode in a single product — rather than offering two separate models like OpenAI's GPT-4o vs o-series split. Achieved state-of-the-art on real software engineering benchmarks (SWE-bench Verified) and powered Anthropic's first agentic IDE feature, Claude Code, released the same week.
How are Intelligence, Speed & Cost bucketed?
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
Claude 3.7 Sonnet was the first model where "AI writes meaningful portions of production codebases autonomously" stopped being a demo and started being a measurable share of real engineering work. The SWE-bench Verified leaderboard from Q1 2025 onward reads like an industry transition.
Core Capabilities
Context Window
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- First hybrid-reasoning model — same checkpoint switches between fast standard mode and extended thinking
- 70.3% on SWE-bench Verified with custom scaffold — industry-leading at release; jumped from 62.3% baseline
- Output token limit jumped 8x to 64K (128K beta) — made long-form code/docs viable in one response
- Anthropic showed the chain-of-thought to users — competitor o1 hides it
- Same price as 3.5 Sonnet ($3 in / $15 out, includes thinking tokens) — extended thinking is essentially free
- Bug-find-and-fix loops finally feel reliable: read report → navigate codebase → root cause → working patch
Best use cases
- Agentic coding with Claude Code CLI — the model designed for this surface
- Math proofs, multi-step logic, and scientific analysis where extended thinking pays off
- Long-form code generation, technical writing, or analysis (up to 128K output)
- Teams that want one model toggling between fast and slow modes via prompt
Tools to try
Not ideal for
- Casual chat or copy that doesn't need reasoning — extended thinking adds latency without benefit
- Hard reasoning leaderboards by late 2025 — Claude 4 / Opus 4.5 have moved well past it
- Workloads needing 1M context (3.7 Sonnet ships 200K)