Kimi K2
Moonshot AI's first open-weight foundation model, released July 2025 — a strategic pivot from the closed-only Kimi consumer product positioning. Trillion-parameter MoE, 32B active, competitive with DeepSeek V3 on most benchmarks and with Claude Sonnet 4 on agentic coding. Brought Moonshot back into the open-frontier conversation after a difficult 2024 of consumer-product user growth challenges.
How are Intelligence, Speed & Cost bucketed?
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
K2 demonstrates that Chinese open-frontier labs are not just DeepSeek and Qwen — Moonshot, Zhipu, MiniMax, Baichuan, and Stepfun all maintain meaningful open-weight presence in 2025-26. The "Chinese open ecosystem" is a multi-lab phenomenon, not a single-vendor story.
Core Capabilities
Context Window
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- 1T-parameter MoE / 32B active per token — Moonshot's open-weights debut, modified MIT license
- 65.8% on SWE-bench Verified single-attempt — outperforms every model tested except Claude Sonnet 4 at release
- 53.7% on LiveCodeBench v6 — strong open-source coding tier
- 75.1% on GPQA Diamond — beats GPT-4.1 (66.3%) and Gemini 2.5 Flash (68.2%)
- Pre-trained on 15.5T tokens with zero training instabilities — the largest stable open MoE training to date
- Set up for agentic workflows — designed around tool-use, not just chat
Best use cases
- Self-hosted agent platforms where API models can't go (regulated, private cloud)
- Cost-sensitive frontier-tier inference via budget providers
- Coding agents that need long context + open weights for fine-tuning
- Chinese/multilingual production deployments where Western platforms are blocked
Tools to try
Not ideal for
- Edge / single-GPU deployments — 1T MoE still demands multi-node serving
- Multimodal tasks (text-only at this generation; vision lives in Kimi-VL)
- Latency-sensitive interactive chat at the full model scale