VIDEO MODEL Feb 2026 Nvidia Last updated: May 13, 2026

NVIDIA Cosmos-Predict 2.5

World Foundation Model for Physical AI

A video diffusion model trained on 200M curated clips, designed not for entertainment but for physical AI: robots, autonomous vehicles, and simulators. Given a text prompt, image, or seed video, it predicts how a scene evolves under physically-plausible dynamics. The 2.5 release unifies Text2World, Image2World, and Video2World in one model, with action-conditioned variants for robotics policies.

Try demo API Docs ↗

Cost

Free

Open weights — self-host

How are Intelligence, Speed & Cost bucketed?

Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).

Intelligence

Top 1%≤ 1%
Top 5%≤ 5%
Top 10%≤ 10%
Good≤ 25%
Medium≤ 50%
Below avg> 50%

Speed

Top 1%≥ 345 tok/s
Top 5%≥ 237 tok/s
Top 10%≥ 196 tok/s
Good≥ 146 tok/s
Medium≥ 90 tok/s
Slow< 90 tok/s

Cost

Freeopen weights · self-host
Low< $1 / M out
Moderate$1–5 / M out
High≥ $5 / M out

Official ↗ GitHub ↗

Why it matters

Marks the point where world models stopped being a research curiosity and became deployable infrastructure for robotics. 2M+ downloads of the Cosmos family by January 2026, and partnerships with most major robotics labs. The split between "video models for entertainment" (Sora, Veo) and "video models for embodied AI" (Cosmos, Genie) crystallized around this release.

Core Capabilities

Generative

Produces images, video, audio, or other media.

Multimodal

Combines text, vision, and audio in one model.

Vision

Understands images, scenes, and visual context.

Agent Workflows

Built for tool use and autonomous tasks.

Context Window

Context window not disclosed.

Availability

API

Not available

Product / App

Not available

Open Source

Released

Enterprise

—

Pricing Model

Free / self-host

Open weights — pay only for compute.

Self-host

What it feels like

Best use cases

Robot policy training (RoboCasa, Libero benchmarks) (NVIDIA)
Autonomous driving simulation (NVIDIA)

Tools to try

NIM Microservices Build catalog

Not ideal for

Turnkey hosted reliability (you’ll need deployment/ops).
Text-heavy reasoning and coding workloads (use an LLM).