MODEL Apr 2025 OpenAI Last updated: Apr 29, 2026

GPT Image 1 / 1.5

OpenAI's Image Frontier

OpenAI's image generator integrated directly into ChatGPT and the Responses API — released as gpt-image-1 in April 2025 and refined to gpt-image-1.5 in late 2025. Excels at prompt adherence, in-image text rendering, and style transfer. Currently leads the Artificial Analysis image arena at ~1264 Elo.

Try ChatGPT API Docs ↗

Official ↗

Why it matters

Replaced DALL-E as the OpenAI image model and reasserted OpenAI's frontier position after a year of MidJourney and Flux leading. Demonstrated that backbone-shared multimodal models beat single-modality image specialists on prompt adherence.

Core Capabilities

Generative

Produces images, video, audio, or other media.

Multimodal

Combines text, vision, and audio in one model.

Vision

Understands images, scenes, and visual context.

Context Window

Context window not disclosed.

Availability

API

Available

Product / App

Not available

Open Source

Not released

Enterprise

Contact sales

Pricing Model

Pay per token

Input and output billed separately.

Pay-per-token

What it feels like

First natively multimodal image model — text and image inputs share the same transformer backbone, not a separate diffusion stack
130M users created 700M images in the first week of the ChatGPT image rollout — extreme product-market fit
Excels at text rendering, prompt adherence, and using GPT-4o's world-knowledge for grounded composition
Still struggles with non-Latin scripts (Chinese / Japanese / Arabic) and small in-image text
Available via ChatGPT, the Images 2.0 product, and the Responses API as gpt-image-1
Successor GPT Image 1.5 (Dec 2025) generates 4x faster — original now feels slow at 10-30s per image

Reviews: OpenAI — Introducing image generation in the API ↗ · OpenAI — Introducing 4o image generation ↗ · InfoQ — Improved image generation in GPT-4o ↗

Best use cases

Conversational image generation inside ChatGPT — context carries from prompt to prompt
Marketing graphics that need readable English text in the image
Image editing with reference inputs (the multimodal input is a real product feature)
Workflows that want to use one OpenAI account for text + image without juggling APIs

Tools to try

ChatGPT Codex CLI Cursor GitHub Copilot Continue.dev

Not ideal for

Languages using non-Latin characters where text-rendering hallucinates
Latency-sensitive bulk generation (10-30s per image; competitors are faster)
Pure aesthetic / illustration work where Midjourney V7 still has stylistic edge

Model Evolution

View full evolution tree →