EMBEDDING MODEL Cohere Last updated:

Cohere Embed v4

128K Context Multimodal Embeddings

Cohere's April 2025 embedding model — 128K context window (the longest available), multilingual across 100+ languages, with native support for binary quantization that shrinks vector size 10× without significant retrieval quality loss. Top of MTEB at 66.3 average score.

Context
128K
Up to 128,000 tokens

Why it matters

Pushed embedding context windows from ~8K (OpenAI text-embedding-3) to 128K — meaning entire chapters / contracts / cases can be embedded as single vectors instead of split-and-aggregate. Changes RAG architecture for long-document use cases.

Core Capabilities

Long Documents
Handles entire codebases, books, and multi-doc RAG.
Multimodal
Combines text, vision, and audio in one model.
Research
Foundational paper or scientific contribution.
Vision
Understands images, scenes, and visual context.

Context Window

128k tokens
≈ 98 pages
4k Chat 聊天
32k Long docs 长文档
128k This model 本模型
400k Multi-doc 多文档
1M Codebase 整个代码库
10M

Availability

API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

What it feels like

  • Vision-language model from Cohere — see the linked sources below for benchmark and review coverage
  • Vision and multimodal tasks are the typical fit per the published model card

Best use cases

  • Vision tasks (charts, documents, images) per the model card
  • See the model spec and sources block for benchmarked use cases

Tools to try

Not ideal for

  • Tasks far outside the modalities listed in this model's spec
  • Workflows where a more recent successor in the same family scores higher