Sora 2
OpenAI's September 2025 Sora successor — the first version with native synchronized audio generation (sound effects, speech, ambient audio matched to visuals) and substantially improved physical consistency (objects fall correctly, fluids flow plausibly, characters maintain identity across cuts). Distributed initially through a TikTok-style consumer app where users could generate and remix short AI videos with friends.
Why it matters
Sora 2 represents the moment AI video moved from research demo to ambient consumer product. The downstream implications — for content moderation, deepfakes, advertising labor markets, and the social-media platform landscape — are still being absorbed in 2026.
Core Capabilities
Context Window
Context window not disclosed.
Availability
Pricing Model
Capability / Performance
Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).
What it feels like
- First Sora generation with native synchronized audio — speech, sound effects, ambient soundscape from one model
- Real physics: missed basketball rebounds off the backboard; objects respect buoyancy and rigidity
- Olympic gymnastics, paddleboard backflips, ice-skating triple axels — motion that prior systems couldn't render
- Cameo feature: insert real team members from a reference video into any generated scene with accurate voice
- NYT called the September 2025 launch 'jaw-dropping (for better and worse)' — TikTok-style social app launched alongside
- Per OpenAI's own framing: errors look like mistakes of the implicit agent, not the model — failure modes are more 'physical' than 'glitch'
Best use cases
- Short-form social video generation (the Sora app's whole purpose)
- Storyboards / previz where physical accuracy matters more than fine-grained creative control
- Custom-character video using cameo for personalisation
- Sound + video pipelines that previously needed separate models stitched together
Tools to try
Not ideal for
- Frame-perfect creative control — no full keyframe editing the way Runway / Kling offer
- Long-form (>60s) coherent narrative — best on short clips
- Production work where IP ownership / training-data provenance matters legally