Genie 3
Real-Time Interactive World Model
DeepMind's August 2025 world model — generates a 720p / 24fps walkable, controllable world from a single image prompt, with minutes of temporal coherence. Move forward, turn, pick up objects — Genie 3 generates the next frame in real time based on your action. The first generative world model that's actually a playable environment, not a passive video.
Why it matters
First credible interactive world model at production fidelity — a category that includes Decart's Oasis (Minecraft), Microsoft Muse, Wayve GAIA-2 (driving), and World Labs Marble (3D scenes). Sets the trajectory for "AI generates the simulator, not just the agent."
Core Capabilities
Generative
Produces images, video, audio, or other media.
Multimodal
Combines text, vision, and audio in one model.
Agent Workflows
Built for tool use and autonomous tasks.
Vision
Understands images, scenes, and visual context.
Context Window
Context window not disclosed.
Availability
API
Not available
Product / App
Not available
Open Source
Not released
Enterprise
—
Pricing Model
Demo access
Limited / waitlisted.
Demo What it feels like
- Vision-language model from Google DeepMind — see the linked sources below for benchmark and review coverage
- Tool-use and agent loops are the typical fit per the published model card
- Vision and multimodal tasks are the typical fit per the published model card
Best use cases
- Agent / tool-use workflows that match the model's published benchmarks
- Vision tasks (charts, documents, images) per the model card
- See the model spec and sources block for benchmarked use cases
Tools to try
Not ideal for
- Tasks far outside the modalities listed in this model's spec
- Workflows where a more recent successor in the same family scores higher