LANGUAGE MODEL DeepSeek Last updated:

DeepSeek R1

Open-Weight Reasoning Model 开放权重的推理模型

A Chinese AI lab released an open-weight reasoning model that approached OpenAI's o1 in quality, with training compute reportedly a fraction of frontier US labs'. Released free with a permissive license. Within a week of release, it triggered a $600B drop in Nvidia's market cap as investors questioned whether closed-model moats and US capex assumptions still held. 一家中国 AI 实验室放出了一款开放权重的推理模型,质量 逼近 OpenAI o1,训练算力据称只是美国前沿实验室的一小部分, 并采用宽松许可免费发布。发布不到一周,它造成英伟达市值 蒸发约 6000 亿美元——投资人开始质疑闭源护城河与美国 AI capex 假设是否还成立。

Intelligence
Below avg
Cost
Moderate
$1.68 in / $4.70 out
Context
128K
Up to 128,000 tokens
How are Intelligence, Speed & Cost bucketed?
Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).
Intelligence
  • Top 1%≤ 1%
  • Top 5%≤ 5%
  • Top 10%≤ 10%
  • Good≤ 25%
  • Medium≤ 50%
  • Below avg> 50%
Speed
  • Top 1%≥ 345 tok/s
  • Top 5%≥ 237 tok/s
  • Top 10%≥ 196 tok/s
  • Good≥ 146 tok/s
  • Medium≥ 90 tok/s
  • Slow< 90 tok/s
Cost
  • Freeopen weights · self-host
  • Low< $1 / M out
  • Moderate$1–5 / M out
  • High≥ $5 / M out

Why it matters

R1 is the model that broke the "scale is the moat" investment thesis in public. Whether the technical claims hold up to longer scrutiny or not, the perception shift was permanent — every frontier US lab now has to defend why their training spend is justified given the existence of open competitors at much lower reported cost.

R1 是戳破"规模即护城河"投资信条的那款模型。不管它的 技术口径能不能经受住更久的审视,认知层面的转变已经 不可逆——每家美国前沿实验室现在都得回答:既然开源阵营 以更低成本也能做到,你训练预算的正当性在哪?

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Generative
Produces images, video, audio, or other media.
Agent Workflows
Built for tool use and autonomous tasks.

Context Window

128k tokens
≈ 98 pages
4k Chat 聊天
32k Long docs 长文档
128k This model 本模型
400k Multi-doc 多文档
1M Codebase 整个代码库
10M

Availability

API
Not available
Product / App
Not available
Open Source
Released
Enterprise

Pricing Model

Free / self-host
Open weights — pay only for compute.
Self-host

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Reasoning
AA Intelligence Index · scaled to 10
1.7
5.6
2.7
Coding
SciCode · scaled to 10
1.8
4.3
4.0
Agentic tasks
Terminal-Bench Hard · scaled to 10
0.2
3.6
1.6
Context / memory
Context window size · log-scaled
6.0
9.0
6.0
Cost efficiency
Input price ($/M tokens) · cheaper scores higher
6.2
10.0
6.7
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • First open-weights reasoning model that genuinely competes with o1 — broke the closed-source moat
  • ~90% on advanced math benchmarks vs ~83% for GPT-4o; the chain-of-thought is fully visible
  • Trained for ~$5.5M on 2,048 H800s — proof you don't need $100M training runs to reach the frontier
  • Excellent on math/logic/coding; weaker on broad creative writing and multi-turn personality
  • Open weights mean you can self-host, fine-tune, distil — no API rate limits
  • January 2025 release sparked the 'DeepSeek moment' that hit Nvidia stock and reset cost expectations

Best use cases

  • Math proofs, logic puzzles, and step-by-step derivations where explicit reasoning helps
  • Coding and engineering tasks that benefit from chain-of-thought
  • On-prem / air-gapped deployments where API models can't go
  • Cost-sensitive bulk inference (≈10x cheaper than ChatGPT-class via providers)
  • Distillation into smaller open models for production

Tools to try

Not ideal for

  • Casual chat, tone, or creative writing — ChatGPT/Claude feel more polished
  • Multimodal tasks (image / vision) — text-only model
  • Latency-sensitive UX — reasoning trace adds significant time