DeepSeek-V2
A Chinese AI lab's 236B-parameter Mixture-of-Experts model, released open-weight with a new attention architecture (MLA) that dramatically reduced inference memory. Its API was priced at $0.28 per million output tokens — two orders of magnitude cheaper than GPT-4 Turbo. The pricing itself was the news: DeepSeek had broken the perceived floor of frontier-quality LLM inference pricing.
How are Intelligence, Speed & Cost bucketed?
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
DeepSeek V2 is the proximal cause of the global LLM inference price collapse in 2024-25. Every frontier lab now has to explain why they charge what they do — when a Chinese open model runs at $0.28/1M tokens and meets 90% of enterprise quality needs.
Core Capabilities
Context Window
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- First DeepSeek model that the wider OSS community took seriously — 236B MoE with only 21B active per token
- Saved 42.5% training cost vs DeepSeek 67B, reduced KV cache 93.3%, boosted max throughput 5.76x
- DeepSeekMoE architecture (fine-grained experts + shared expert isolation) became the template
- 128K context window at open-weights pricing — rare in mid-2024
- MMLU performance in the top open-source tier despite the small active-parameter footprint
- Foreshadowed V3 / R1 — proved the cost-efficient training playbook before it shocked Wall Street
Best use cases
- Self-hosted Chinese / multilingual deployments where API access was politically risky
- Cost-sensitive bulk inference via budget providers
- MoE-architecture research and KV-cache optimisation experiments
- Long-context retrieval workflows (128K) at open-weights price
Tools to try
Not ideal for
- Frontier reasoning by 2025 — superseded by V3 / V3.1 / V4 generations
- Edge / single-GPU deployments (236B MoE still demands multi-GPU serving)
- Vision / multimodal tasks (DeepSeek-VL came separately)