MODEL OpenAI Last updated:

GPT Image 1 / 1.5

OpenAI's Image Frontier

OpenAI's image generator integrated directly into ChatGPT and the Responses API — released as gpt-image-1 in April 2025 and refined to gpt-image-1.5 in late 2025. Excels at prompt adherence, in-image text rendering, and style transfer. Currently leads the Artificial Analysis image arena at ~1264 Elo.

Why it matters

Replaced DALL-E as the OpenAI image model and reasserted OpenAI's frontier position after a year of MidJourney and Flux leading. Demonstrated that backbone-shared multimodal models beat single-modality image specialists on prompt adherence.

Core Capabilities

Generative
Produces images, video, audio, or other media.
Multimodal
Combines text, vision, and audio in one model.
Vision
Understands images, scenes, and visual context.

Context Window

Context window not disclosed.

Availability

API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

What it feels like

  • First natively multimodal image model — text and image inputs share the same transformer backbone, not a separate diffusion stack
  • 130M users created 700M images in the first week of the ChatGPT image rollout — extreme product-market fit
  • Excels at text rendering, prompt adherence, and using GPT-4o's world-knowledge for grounded composition
  • Still struggles with non-Latin scripts (Chinese / Japanese / Arabic) and small in-image text
  • Available via ChatGPT, the Images 2.0 product, and the Responses API as gpt-image-1
  • Successor GPT Image 1.5 (Dec 2025) generates 4x faster — original now feels slow at 10-30s per image

Best use cases

  • Conversational image generation inside ChatGPT — context carries from prompt to prompt
  • Marketing graphics that need readable English text in the image
  • Image editing with reference inputs (the multimodal input is a real product feature)
  • Workflows that want to use one OpenAI account for text + image without juggling APIs

Tools to try

Not ideal for

  • Languages using non-Latin characters where text-rendering hallucinates
  • Latency-sensitive bulk generation (10-30s per image; competitors are faster)
  • Pure aesthetic / illustration work where Midjourney V7 still has stylistic edge

Model Evolution

View full evolution tree →