Cohere Embed v4
128K Context Multimodal Embeddings
Cohere's April 2025 embedding model — 128K context window (the longest available), multilingual across 100+ languages, with native support for binary quantization that shrinks vector size 10× without significant retrieval quality loss. Top of MTEB at 66.3 average score.
Context
128K
Up to 128,000 tokens
Why it matters
Pushed embedding context windows from ~8K (OpenAI text-embedding-3) to 128K — meaning entire chapters / contracts / cases can be embedded as single vectors instead of split-and-aggregate. Changes RAG architecture for long-document use cases.
Core Capabilities
Long Documents
Handles entire codebases, books, and multi-doc RAG.
Multimodal
Combines text, vision, and audio in one model.
Research
Foundational paper or scientific contribution.
Vision
Understands images, scenes, and visual context.
Context Window
128k tokens
≈ 98 pages
4k Chat 聊天
32k Long docs 长文档
128k This model 本模型
400k Multi-doc 多文档
1M Codebase 整个代码库
10M
Availability
API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales
Pricing Model
Pay per token
Input and output billed separately.
Pay-per-token What it feels like
- Vision-language model from Cohere — see the linked sources below for benchmark and review coverage
- Vision and multimodal tasks are the typical fit per the published model card
Best use cases
- Vision tasks (charts, documents, images) per the model card
- See the model spec and sources block for benchmarked use cases
Tools to try
Not ideal for
- Tasks far outside the modalities listed in this model's spec
- Workflows where a more recent successor in the same family scores higher