AUDIO MODEL Sep 2025 OpenAI Last updated: Apr 29, 2026

Sora 2

Native Audio + Improved Physics

OpenAI's September 2025 Sora successor — the first version with native synchronized audio generation (sound effects, speech, ambient audio matched to visuals) and substantially improved physical consistency (objects fall correctly, fluids flow plausibly, characters maintain identity across cuts). Distributed initially through a TikTok-style consumer app where users could generate and remix short AI videos with friends.

Try ChatGPT API Docs ↗

Official ↗

Why it matters

Sora 2 represents the moment AI video moved from research demo to ambient consumer product. The downstream implications — for content moderation, deepfakes, advertising labor markets, and the social-media platform landscape — are still being absorbed in 2026.

Core Capabilities

Generative

Produces images, video, audio, or other media.

Audio

Speech, music, or other audio understanding/synthesis.

Multimodal

Combines text, vision, and audio in one model.

Context Window

Context window not disclosed.

Availability

API

Not available

Product / App

Available

Open Source

Not released

Enterprise

Contact sales

Pricing Model

Subscription

Bundled inside the host product.

Subscription

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Quality

No data reported · placeholder

5.0

Speed

No data reported · placeholder

5.0

Control

No data reported · placeholder

5.0

Consistency

No data reported · placeholder

5.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

First Sora generation with native synchronized audio — speech, sound effects, ambient soundscape from one model
Real physics: missed basketball rebounds off the backboard; objects respect buoyancy and rigidity
Olympic gymnastics, paddleboard backflips, ice-skating triple axels — motion that prior systems couldn't render
Cameo feature: insert real team members from a reference video into any generated scene with accurate voice
NYT called the September 2025 launch 'jaw-dropping (for better and worse)' — TikTok-style social app launched alongside
Per OpenAI's own framing: errors look like mistakes of the implicit agent, not the model — failure modes are more 'physical' than 'glitch'

Reviews: OpenAI — Sora 2 announcement ↗ · Wikipedia — Sora text-to-video model ↗ · Cybernews — Sora 2 review ↗

Best use cases

Short-form social video generation (the Sora app's whole purpose)
Storyboards / previz where physical accuracy matters more than fine-grained creative control
Custom-character video using cameo for personalisation
Sound + video pipelines that previously needed separate models stitched together

Tools to try

ChatGPT Codex CLI Cursor GitHub Copilot Continue.dev

Not ideal for

Frame-perfect creative control — no full keyframe editing the way Runway / Kling offer
Long-form (>60s) coherent narrative — best on short clips
Production work where IP ownership / training-data provenance matters legally

Model Evolution

View full evolution tree →