LANGUAGE MODEL Jul 2025 Moonshot AI Last updated: Apr 29, 2026

Kimi K2

Moonshot's Open MoE Reasoning

Moonshot AI's first open-weight foundation model, released July 2025 — a strategic pivot from the closed-only Kimi consumer product positioning. Trillion-parameter MoE, 32B active, competitive with DeepSeek V3 on most benchmarks and with Claude Sonnet 4 on agentic coding. Brought Moonshot back into the open-frontier conversation after a difficult 2024 of consumer-product user growth challenges.

Try Kimi API Docs ↗

Intelligence

Good

Speed

Slow

41 tok/s output

Cost

Moderate

$0.50 in / $2.85 out

Context

200K

Up to 200,000 tokens

How are Intelligence, Speed & Cost bucketed?

Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).

Intelligence

Top 1%≤ 1%
Top 5%≤ 5%
Top 10%≤ 10%
Good≤ 25%
Medium≤ 50%
Below avg> 50%

Speed

Top 1%≥ 345 tok/s
Top 5%≥ 237 tok/s
Top 10%≥ 196 tok/s
Good≥ 146 tok/s
Medium≥ 90 tok/s
Slow< 90 tok/s

Cost

Freeopen weights · self-host
Low< $1 / M out
Moderate$1–5 / M out
High≥ $5 / M out

Official ↗ GitHub ↗ Artificial Analysis ↗ Hugging Face ↗

Why it matters

K2 demonstrates that Chinese open-frontier labs are not just DeepSeek and Qwen — Moonshot, Zhipu, MiniMax, Baichuan, and Stepfun all maintain meaningful open-weight presence in 2025-26. The "Chinese open ecosystem" is a multi-lab phenomenon, not a single-vendor story.

Core Capabilities

Long Documents

Handles entire codebases, books, and multi-doc RAG.

Generative

Produces images, video, audio, or other media.

Agent Workflows

Built for tool use and autonomous tasks.

Context Window

200k tokens

≈ 154+ pages

4k Chat 聊天

32k Long docs 长文档

128k Books 整本书

400k Multi-doc 多文档

1M Codebase 整个代码库

10M

200k

Availability

API

Available

Product / App

Available

Open Source

Released

Enterprise

Contact sales

Pricing Model

Free / self-host

Open weights — pay only for compute.

Self-host

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Reasoning

AA Intelligence Index · scaled to 10

1.7

5.6

6.7

Coding

SciCode · scaled to 10

1.8

4.3

4.9

Agentic tasks

Terminal-Bench Hard · scaled to 10

0.2

3.6

4.4

Context / memory

Context window size · log-scaled

6.0

9.0

6.7

Cost efficiency

Input price ($/M tokens) · cheaper scores higher

6.2

10.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

1T-parameter MoE / 32B active per token — Moonshot's open-weights debut, modified MIT license
65.8% on SWE-bench Verified single-attempt — outperforms every model tested except Claude Sonnet 4 at release
53.7% on LiveCodeBench v6 — strong open-source coding tier
75.1% on GPQA Diamond — beats GPT-4.1 (66.3%) and Gemini 2.5 Flash (68.2%)
Pre-trained on 15.5T tokens with zero training instabilities — the largest stable open MoE training to date
Set up for agentic workflows — designed around tool-use, not just chat

Reviews: Moonshot AI — Kimi K2 announcement ↗ · Cline — Kimi K2 for coding first impressions ↗ · Data Science Dojo — Kimi K2 deep dive ↗

Best use cases

Self-hosted agent platforms where API models can't go (regulated, private cloud)
Cost-sensitive frontier-tier inference via budget providers
Coding agents that need long context + open weights for fine-tuning
Chinese/multilingual production deployments where Western platforms are blocked

Tools to try

Kimi Kimi Platform

Not ideal for

Edge / single-GPU deployments — 1T MoE still demands multi-node serving
Multimodal tasks (text-only at this generation; vision lives in Kimi-VL)
Latency-sensitive interactive chat at the full model scale

Model Evolution

View full evolution tree →