LANGUAGE MODEL Baidu Last updated:

Ernie 4.0

Baidu's Closed Frontier Bet

Baidu's flagship closed-API LLM, launched October 2023 as the Chinese answer to GPT-4. Pre-existed ChatGPT in concept (Ernie 1.0 was 2019, predating BERT-style era for Baidu) but the 4.0 release was the first to be marketed as a frontier-tier consumer product in China. Closed API, integrated into Baidu Search, Baidu Maps, Baidu's autonomous-driving stack, and Wenku.

Context
8K
Up to 8,192 tokens

Why it matters

Ernie illustrates the strategic challenge faced by China's incumbent tech giants in AI: the technical assets exist, but the open-source upstarts (DeepSeek, Qwen) and consumer-distribution leaders (ByteDance Doubao) have made it hard for Baidu to convert technical parity into market share.

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Generative
Produces images, video, audio, or other media.
Agent Workflows
Built for tool use and autonomous tasks.
Vision
Understands images, scenes, and visual context.

Context Window

8k tokens
≈ short doc
4k Chat 聊天
32k Long docs 长文档
128k Books 整本书
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
8k

Availability

API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Context / memory
Context window size · log-scaled
6.0
9.0
2.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • Vision-language model from Baidu — see the linked sources below for benchmark and review coverage
  • Tool-use and agent loops are the typical fit per the published model card
  • Vision and multimodal tasks are the typical fit per the published model card

Best use cases

  • Agent / tool-use workflows that match the model's published benchmarks
  • Vision tasks (charts, documents, images) per the model card
  • See the model spec and sources block for benchmarked use cases

Tools to try

Not ideal for

  • Tasks far outside the modalities listed in this model's spec
  • Workflows where a more recent successor in the same family scores higher