AUDIO MODEL Dec 2023 Google/DeepMind Last updated: Apr 29, 2026

Gemini 1

Google's Native Multimodal Frontier

Google's first model family launched under the Gemini brand, designed natively for text + images + audio + video as a single integrated system. Released in three sizes: Ultra (flagship), Pro (balanced, on Bard), and Nano (1.8B and 3.25B, on Pixel devices). Marketed as the first model to surpass GPT-4 on MMLU, though under contested benchmarking conditions.

Try Gemini API Docs ↗

Intelligence

Good

Speed

Top 5%

296 tok/s output

Cost

Moderate

$0.25 in / $1.50 out

Context

33K

Up to 32,768 tokens

How are Intelligence, Speed & Cost bucketed?

Intelligence and Speed buckets are percentile ranks on Artificial Analysis. Cost buckets are fixed dollar thresholds keyed off output-token price ($/M out).

Intelligence

Top 1%≤ 1%
Top 5%≤ 5%
Top 10%≤ 10%
Good≤ 25%
Medium≤ 50%
Below avg> 50%

Speed

Top 1%≥ 345 tok/s
Top 5%≥ 237 tok/s
Top 10%≥ 196 tok/s
Good≥ 146 tok/s
Medium≥ 90 tok/s
Slow< 90 tok/s

Cost

Freeopen weights · self-host
Low< $1 / M out
Moderate$1–5 / M out
High≥ $5 / M out

Official ↗ Artificial Analysis ↗

Why it matters

Gemini 1 was less a technical breakthrough than a strategic reset — Google reorganized its AI efforts under DeepMind's Demis Hassabis (post the May 2023 Brain + DeepMind merger) and Gemini was the first product output of that consolidation.

Core Capabilities

Long Documents

Handles entire codebases, books, and multi-doc RAG.

Multimodal

Combines text, vision, and audio in one model.

Generative

Produces images, video, audio, or other media.

Agent Workflows

Built for tool use and autonomous tasks.

Context Window

33k tokens

≈ 25 pages

4k Chat 聊天

32k This model 本模型

128k Books 整本书

400k Multi-doc 多文档

1M Codebase 整个代码库

10M

Availability

API

Available

Product / App

Available

Open Source

Not released

Enterprise

Contact sales

Pricing Model

Pay per token

Input and output billed separately.

Pay-per-token

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Quality

AA Intelligence Index · scaled to 10

1.7

5.6

4.8

Speed

Output throughput · log-scaled

10.0

Cost efficiency

Input price ($/M tokens) · cheaper scores higher

6.2

10.0

Consistency

No data reported · placeholder

5.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

Original Gemini 1 — historical; superseded by 1.5+.
First model where 1M-token context was a real product feature, not a benchmark headline
99% needle-in-haystack accuracy at 1M tokens; 99.2% even at 10M in research configurations
Multi-needle recall drops to ~60% — single fact retrieval is solid, multi-fact is harder

Reviews: Google — Introducing Gemini 1.5 ↗ · Google — Gemini 1.5 Pro May 2024 update ↗ · Google Cloud — Needle in Haystack test ↗

Best use cases

Whole-codebase analysis (30K+ lines) without tedious chunking pipelines
Document QA over 1,500-page PDFs / batches of 100 emails
Hour-long video summarisation and audio transcription QA

Tools to try

Gemini app AI Studio Vertex AI

Not ideal for

Tasks requiring multi-fact retrieval across long contexts (recall drops sharply)
Pure short-context chat — Flash variants are cheaper and faster

Model Evolution

View full evolution tree →