VIDEO MODEL Nvidia Last updated:

NVIDIA Cosmos-Predict 2.5

World Foundation Model for Physical AI

A video diffusion model trained on 200M curated clips, designed not for entertainment but for physical AI: robots, autonomous vehicles, and simulators. Given a text prompt, image, or seed video, it predicts how a scene evolves under physically-plausible dynamics. The 2.5 release unifies Text2World, Image2World, and Video2World in one model, with action-conditioned variants for robotics policies.

Cost
Free
Open weights — self-host
How are Intelligence, Speed & Cost bucketed?
Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).
Intelligence
  • Top 1%≤ 1%
  • Top 5%≤ 5%
  • Top 10%≤ 10%
  • Good≤ 25%
  • Medium≤ 50%
  • Below avg> 50%
Speed
  • Top 1%≥ 345 tok/s
  • Top 5%≥ 237 tok/s
  • Top 10%≥ 196 tok/s
  • Good≥ 146 tok/s
  • Medium≥ 90 tok/s
  • Slow< 90 tok/s
Cost
  • Freeopen weights · self-host
  • Low< $1 / M out
  • Moderate$1–5 / M out
  • High≥ $5 / M out

Why it matters

Marks the point where world models stopped being a research curiosity and became deployable infrastructure for robotics. 2M+ downloads of the Cosmos family by January 2026, and partnerships with most major robotics labs. The split between "video models for entertainment" (Sora, Veo) and "video models for embodied AI" (Cosmos, Genie) crystallized around this release.

Core Capabilities

Generative
Produces images, video, audio, or other media.
Multimodal
Combines text, vision, and audio in one model.
Vision
Understands images, scenes, and visual context.
Agent Workflows
Built for tool use and autonomous tasks.

Context Window

Context window not disclosed.

Availability

API
Not available
Product / App
Not available
Open Source
Released
Enterprise

Pricing Model

Free / self-host
Open weights — pay only for compute.
Self-host

What it feels like

Best use cases

  • Robot policy training (RoboCasa, Libero benchmarks) (NVIDIA)
  • Autonomous driving simulation (NVIDIA)

Tools to try

Not ideal for

  • Turnkey hosted reliability (you’ll need deployment/ops).
  • Text-heavy reasoning and coding workloads (use an LLM).