LANGUAGE MODEL Moonshot AI Last updated:

Kimi Chat

Moonshot AI's Long-Context Bet

A Beijing startup founded in March 2023 by ex-Tsinghua / ex-Google researchers; their consumer chat product Kimi launched in October 2023 with a 200k-token context window — the longest in the Chinese market at the time. The product's "upload an entire book / annual report and ask anything" framing made it the consumer breakout among Chinese AI assistants in 2024. Expanded to 2M tokens in March 2024.

Context
200K
Up to 200,000 tokens

Why it matters

Kimi made "long context" a marketing dimension that Chinese consumers shopped on — analogous to how "fast charging" became a smartphone marketing dimension in China before it was elsewhere. Long-context as a productization choice (not just a benchmark number) became the operating playbook.

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Generative
Produces images, video, audio, or other media.

Context Window

200k tokens
≈ 154+ pages
4k Chat 聊天
32k Long docs 长文档
128k Books 整本书
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
200k

Availability

API
Available
Product / App
Available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Context / memory
Context window size · log-scaled
6.0
9.0
6.7
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • 1T-parameter MoE / 32B active per token — Moonshot's open-weights debut, modified MIT license
  • 65.8% on SWE-bench Verified single-attempt — outperforms every model tested except Claude Sonnet 4 at release
  • 53.7% on LiveCodeBench v6 — strong open-source coding tier

Best use cases

  • Self-hosted agent platforms where API models can't go (regulated, private cloud)
  • Cost-sensitive frontier-tier inference via budget providers
  • Coding agents that need long context + open weights for fine-tuning

Tools to try

Not ideal for

  • Edge / single-GPU deployments — 1T MoE still demands multi-node serving
  • Multimodal tasks (text-only at this generation; vision lives in Kimi-VL)

Model Evolution

View full evolution tree →