MODEL Aug 2025 Google/DeepMind Last updated: Apr 29, 2026

Genie 3

Real-Time Interactive World Model

DeepMind's August 2025 world model — generates a 720p / 24fps walkable, controllable world from a single image prompt, with minutes of temporal coherence. Move forward, turn, pick up objects — Genie 3 generates the next frame in real time based on your action. The first generative world model that's actually a playable environment, not a passive video.

Try Gemini API Docs ↗

Official ↗

Why it matters

First credible interactive world model at production fidelity — a category that includes Decart's Oasis (Minecraft), Microsoft Muse, Wayve GAIA-2 (driving), and World Labs Marble (3D scenes). Sets the trajectory for "AI generates the simulator, not just the agent."

Core Capabilities

Generative

Produces images, video, audio, or other media.

Multimodal

Combines text, vision, and audio in one model.

Agent Workflows

Built for tool use and autonomous tasks.

Vision

Understands images, scenes, and visual context.

Context Window

Context window not disclosed.

Availability

API

Not available

Product / App

Not available

Open Source

Not released

Enterprise

—

Pricing Model

Demo access

Limited / waitlisted.

Demo

What it feels like

Vision-language model from Google DeepMind — see the linked sources below for benchmark and review coverage
Tool-use and agent loops are the typical fit per the published model card
Vision and multimodal tasks are the typical fit per the published model card

Best use cases

Agent / tool-use workflows that match the model's published benchmarks
Vision tasks (charts, documents, images) per the model card
See the model spec and sources block for benchmarked use cases

Tools to try

Gemini app AI Studio Vertex AI

Not ideal for

Tasks far outside the modalities listed in this model's spec
Workflows where a more recent successor in the same family scores higher