AUDIO MODEL ByteDance Last updated:

Doubao 1.5 Pro

ByteDance's Sparse MoE Flagship

ByteDance's flagship API model, released January 22, 2025 — one day before DeepSeek R1 stole the global news cycle. Doubao 1.5 Pro was the model behind ByteDance's domestic chat product (Doubao app) which had ~75M MAU by end of 2024, making it the most-used AI assistant in China by user count.

Context
128K
Up to 128,000 tokens

Why it matters

Doubao is the consumer Chinese AI product the West underestimates most. By distribution count, it's larger than any Western AI consumer product outside ChatGPT. ByteDance's quiet success here shapes the long-term competitive landscape more than the more publicized DeepSeek / Qwen battles.

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Multimodal
Combines text, vision, and audio in one model.
Generative
Produces images, video, audio, or other media.
Agent Workflows
Built for tool use and autonomous tasks.

Context Window

128k tokens
≈ 98 pages
4k Chat 聊天
32k Long docs 长文档
128k This model 本模型
400k Multi-doc 多文档
1M Codebase 整个代码库
10M

Availability

API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Quality
No data reported · placeholder
5.0
Speed
No data reported · placeholder
5.0
Control
No data reported · placeholder
5.0
Consistency
No data reported · placeholder
5.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • Matches GPT-4o and Claude 3.5 Sonnet on benchmarks at ~50x cheaper price — the headline claim
  • Deep Thinking mode beats OpenAI o1-preview and o1 on AIME
  • Outperformed DeepSeek-V3, GPT-4o, and Llama 3-405B on MMLU-Pro and GPQA
  • Sparse MoE — activates only a subset per token, dense-model quality at fraction of compute
  • 32K to 256K context windows — flexible for both chat and long-doc work
  • Released the day before DeepSeek R1 stole the cycle — under-marketed but technically competitive

Best use cases

  • Cost-sensitive bulk inference at frontier-tier quality (5x cheaper than DeepSeek)
  • Chinese-language production deployments via the Doubao consumer app
  • Long-doc workflows leveraging the 256K context
  • ByteDance ecosystem integrations (TikTok, Capcut, internal tools)

Tools to try

Not ideal for

  • Self-hosted / open-weights deployments — closed API only
  • Workflows where regulatory restrictions on Chinese-cloud apply
  • Frontier reasoning leaderboards by mid-2025 — Claude 4, GPT-5, DeepSeek R1 lead