LANGUAGE MODEL Mar 2024 Cognition Last updated: Apr 29, 2026

Devin

First Autonomous Software Engineer Agent

Cognition AI's autonomous SWE agent — released as a demo in March 2024, generally available later. Devin ran in a sandboxed Linux environment with shell, browser, and code editor, and completed multi-hour software engineering tasks unsupervised. The SWE-bench score (13.86%) at launch was 7× the prior best, kicking off the agentic-coding arms race.

Try demo

Official ↗

Why it matters

Defined the autonomous-coding-agent category. Whether or not Devin itself dominates long-term, every coding-agent product (Cursor Composer, Windsurf Cascade, Claude Code, Codex CLI) is a response to Devin's framing of "agent in a sandbox" as the right product shape.

Core Capabilities

Agent Workflows

Built for tool use and autonomous tasks.

Generative

Produces images, video, audio, or other media.

Coding

Strong real-world software engineering.

Multimodal

Combines text, vision, and audio in one model.

Context Window

Context window not disclosed.

Availability

API

Not available

Product / App

Available

Open Source

Not released

Enterprise

Contact sales

Pricing Model

Subscription

Bundled inside the host product.

Subscription

Capability / Performance

Where this model sits relative to the middle 60% of models in the tree. All scores are 0–10 (higher is better).

Lower 20% Upper 80% This model

Lower 20% 20th percentile — 20% of models score below this This model Where the current model lands Upper 80% 80th percentile — only 20% of models score above this Percentile boundaries are computed across every model in the tree that reports the underlying benchmark for each capability.

What it feels like

Language model from Cognition — see the linked sources below for benchmark and review coverage
Code-leaning workloads are the typical fit per the published model card
Tool-use and agent loops are the typical fit per the published model card

Best use cases

Coding workflows that match the model's published benchmarks
Agent / tool-use workflows that match the model's published benchmarks
See the model spec and sources block for benchmarked use cases

Not ideal for

Tasks far outside the modalities listed in this model's spec
Workflows where a more recent successor in the same family scores higher