LANGUAGE MODEL Mar 2022 Google/DeepMind

Chinchilla

Compute-Optimal LLM Training

A DeepMind study that showed — contrary to two years of received wisdom — that most existing large language models were under-trained. A 70B model trained on 4× more data outperformed a 280B model trained on the same compute. Everyone had been making models too large relative to their data budget.

Try Gemini API Docs ↗

Context

Up to 2,048 tokens

Official ↗

Why it matters

If the model on your phone is smaller and yet just as capable as GPT-3 was five years ago, you are benefiting from Chinchilla. Every efficient small model (phi, Mistral 7B, Gemma) is a Chinchilla descendant.

Core Capabilities

Research

Foundational paper or scientific contribution.

Long Documents

Handles entire codebases, books, and multi-doc RAG.

Context Window

2k tokens

short prompt

4k Chat 聊天

32k Long docs 长文档

128k Books 整本书

400k Multi-doc 多文档

1M Codebase 整个代码库

10M

Availability

API

Not available

Product / App

Not available

Open Source

Not released

Enterprise

—

Pricing Model

Research artifact

Not commercially released.

Research

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Context / memory

Context window size · log-scaled

6.0

9.0

0.0

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

Language model from DeepMind — see the linked sources below for benchmark and review coverage

Best use cases

General-purpose tasks within DeepMind's deployment footprint
See the model spec and sources block for benchmarked use cases

Tools to try

Gemini app AI Studio

Not ideal for

Tasks far outside the modalities listed in this model's spec
Workflows where a more recent successor in the same family scores higher

Hoffmann, J. · Borgeaud, S. · Mensch, A. · et al.