IMAGE MODEL Nov 2025 World Labs Last updated: May 13, 2026

World Labs Marble

Spatially-Intelligent 3D World Generator

A generative model that produces editable 3D worlds — geometry, materials, lighting — from a text prompt or a reference image. The output drops into existing game engines and VFX pipelines as actual 3D assets, not video frames.

Try demo

Official ↗

Why it matters

The first commercial release in Fei-Fei Li's "spatial intelligence" framing — the claim that AI's next plateau is 3D-grounded perception and generation, not larger language models. Marble's output is editable 3D, not video, which is the structural distinction from the Sora / Veo line. If the framing holds, this is the seed of an entire parallel branch of generative AI.

Core Capabilities

Generative

Produces images, video, audio, or other media.

Vision

Understands images, scenes, and visual context.

Multimodal

Combines text, vision, and audio in one model.

Context Window

Context window not disclosed.

Availability

API

Available

Product / App

Available

Open Source

Not released

Enterprise

Contact sales

Pricing Model

Pay per token

Input and output billed separately.

Pay-per-token

What it feels like

Best use cases

Game development asset generation (World Labs)
Visual-effects pre-visualization (World Labs)

Not ideal for

Fully offline / air-gapped deployment.
Text-heavy reasoning and coding workloads (use an LLM).

Li, Fei-Fei · Johnson, Justin · Lassner, Christoph · Mildenhall, Ben