LANGUAGE MODEL Apr 2025 Alibaba Last updated: Apr 29, 2026

Qwen 3

Toggleable Thinking, 119 Languages 可切换思考模式，119 语种

Alibaba's open-weight Qwen 3 family covers everything from a tiny 0.6B model to a 235B mixture-of-experts. Every size has a switch that turns "thinking" mode on or off — same weights, two behaviors. Speaks 119 languages and was the first big open release to match DeepSeek-R1 on reasoning benchmarks. 阿里开放权重的 Qwen 3 家族，规格从 0.6B 一路开到 235B 的 MoE 全覆盖。每个规格都内置"思考模式"开关——同一份权重，两套行为。支持 119 种语言，也是第一个在推理基准上匹敌 DeepSeek-R1 的大型开源版本。

Try Qwen API Docs ↗

Intelligence

Medium

Speed

Slow

86 tok/s output

Cost

Low

$0.08 in / $0.29 out

Context

Up to 1,000,000 tokens

How are Intelligence, Speed & Cost bucketed?

Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).

Intelligence

Top 1%≤ 1%
Top 5%≤ 5%
Top 10%≤ 10%
Good≤ 25%
Medium≤ 50%
Below avg> 50%

Speed

Top 1%≥ 345 tok/s
Top 5%≥ 237 tok/s
Top 10%≥ 196 tok/s
Good≥ 146 tok/s
Medium≥ 90 tok/s
Slow< 90 tok/s

Cost

Freeopen weights · self-host
Low< $1 / M out
Moderate$1–5 / M out
High≥ $5 / M out

Official ↗ GitHub ↗ Artificial Analysis ↗ Hugging Face ↗

Why it matters

Demonstrated that the open-weight Chinese ecosystem could match Western closed reasoning models within months of o3 and R1, on permissive licenses, across the entire size spectrum from edge (0.6B) to frontier (235B MoE).

证明了开源中文生态在 o3 和 R1 发布几个月内就能跟上西方闭源推理模型——还顶着宽松许可，从 0.6B 边缘端到 235B MoE 前沿端全尺寸段都不缺席。

Core Capabilities

Long Documents

Handles entire codebases, books, and multi-doc RAG.

Multimodal

Combines text, vision, and audio in one model.

Generative

Produces images, video, audio, or other media.

Agent Workflows

Built for tool use and autonomous tasks.

Context Window

1M tokens

≈ entire codebase

4k Chat 聊天

32k Long docs 长文档

128k Books 整本书

400k Multi-doc 多文档

1M This model 本模型

10M

Availability

API

Available

Product / App

Available

Open Source

Released

Enterprise

Contact sales

Pricing Model

Free / self-host

Open weights — pay only for compute.

Self-host

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Reasoning

AA Intelligence Index · scaled to 10

1.7

5.6

2.9

Coding

SciCode · scaled to 10

1.8

4.3

3.1

Agentic tasks

Terminal-Bench Hard · scaled to 10

0.2

3.6

0.8

Context / memory

Context window size · log-scaled

6.0

9.0

Cost efficiency

Input price ($/M tokens) · cheaper scores higher

6.2

10.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

Best open-source reasoning model at its release — 235B-A22B (Thinking) beats DeepSeek-R1 on 17/23 benchmarks
Toggle-able thinking mode: same weights serve both reasoning and fast-chat modes
Strong 119-language coverage; the most genuinely multilingual frontier-tier model
Coder variant reaches 77.2% on SWE-bench Verified — competitive with Claude 4.5 Opus's 80.9%
GPQA Diamond 87.8% and AIME26 94.1% — frontier reasoning at open-weights pricing
Apache-2.0 license + 1M-context coder variant make it production-ready, not a research toy

Reviews: Qwen team — Qwen3 launch blog ↗ · Qwen3 Technical Report (arXiv) ↗ · MarkTechPost — Qwen3.6-27B coding benchmarks ↗

Best use cases

Multilingual production deployments (119 languages) where most models stay English-centric
Self-hosted reasoning workflows that need both 'fast mode' and 'thinking mode' from one weight set
Open-weights agentic coding (Qwen3-Coder) with very large context windows
Cost-sensitive bulk reasoning that would be prohibitive on closed APIs

Tools to try

Qwen Chat Hugging Face Ollama DashScope

Not ideal for

Multimodal tasks (Qwen3 base is text — vision lives in Qwen3-VL, audio in Qwen3-Audio)
Edge / single-consumer-GPU deployments at the 235B scale
Workflows where Western-platform compatibility is a contractual requirement