VIDEO MODEL OpenAI Last updated:

Sora

OpenAI's Text-to-Video

OpenAI's text-to-video model, demonstrated as research previews in February 2024. The viral demo videos — a fashion-shoot in Tokyo, a snow leopard in mountains, vintage California drone footage — convinced the public that AI-generated video was crossing into "could be mistaken for real footage" territory. Held back from public release until December 2024 (Sora 1.0) and meaningfully iterated as Sora 2 in late 2025.

Why it matters

Sora is the inflection point where video joined image, text, audio, and code as a domain where AI-generated content was no longer obviously distinguishable from human-produced content. The implications for advertising, film, content moderation, and disinformation are still being absorbed.

Core Capabilities

Generative
Produces images, video, audio, or other media.
Multimodal
Combines text, vision, and audio in one model.

Context Window

Context window not disclosed.

Availability

API
Not available
Product / App
Not available
Open Source
Not released
Enterprise

Pricing Model

Demo access
Limited / waitlisted.
Demo

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model
Quality
No data reported · placeholder
5.0
Speed
No data reported · placeholder
5.0
Control
No data reported · placeholder
5.0
Consistency
No data reported · placeholder
5.0
Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

  • Earlier Sora preview — succeeded by Sora 2.
  • First Sora generation with native synchronized audio — speech, sound effects, ambient soundscape from one model
  • Real physics: missed basketball rebounds off the backboard; objects respect buoyancy and rigidity
  • Olympic gymnastics, paddleboard backflips, ice-skating triple axels — motion that prior systems couldn't render

Best use cases

  • Short-form social video generation (the Sora app's whole purpose)
  • Storyboards / previz where physical accuracy matters more than fine-grained creative control
  • Custom-character video using cameo for personalisation

Tools to try

Not ideal for

  • Frame-perfect creative control — no full keyframe editing the way Runway / Kling offer
  • Long-form (>60s) coherent narrative — best on short clips

Model Evolution

View full evolution tree →