SAM 3
Promptable Concept Segmentation
Meta's November 2025 segmentation model — the third generation of SAM. New trick: you can prompt with a TEXT phrase ("the red car in the background") and SAM 3 will segment all matching objects. Earlier SAM versions only accepted point/box prompts. Combined with SAM 3.1 and SAM 3D for shape inference, deployed in Facebook Marketplace's "View in Room" and Instagram Edits.
Cost
Free
Open weights — self-host
How are Intelligence, Speed & Cost bucketed?
Intelligence and Speed buckets are percentile ranks on
Artificial Analysis. Cost buckets are fixed dollar
thresholds keyed off output-token price ($/M out).
Intelligence
- Top 1%≤ 1%
- Top 5%≤ 5%
- Top 10%≤ 10%
- Good≤ 25%
- Medium≤ 50%
- Below avg> 50%
Speed
- Top 1%≥ 345 tok/s
- Top 5%≥ 237 tok/s
- Top 10%≥ 196 tok/s
- Good≥ 146 tok/s
- Medium≥ 90 tok/s
- Slow< 90 tok/s
Cost
- Freeopen weights · self-host
- Low< $1 / M out
- Moderate$1–5 / M out
- High≥ $5 / M out
Why it matters
Replaced annotation pipelines with text prompts in many CV workflows. Pushes toward a future where computer vision is "describe what you want" rather than "label thousands of examples."
Core Capabilities
Vision
Understands images, scenes, and visual context.
Generative
Produces images, video, audio, or other media.
Context Window
Context window not disclosed.
Availability
API
Available
Product / App
Not available
Open Source
Released
Enterprise
Contact sales
Pricing Model
Free / self-host
Open weights — pay only for compute.
Self-host What it feels like
- Vision-language model from Meta AI — see the linked sources below for benchmark and review coverage
- Vision and multimodal tasks are the typical fit per the published model card
Best use cases
- Vision tasks (charts, documents, images) per the model card
- See the model spec and sources block for benchmarked use cases
Tools to try
Not ideal for
- Tasks far outside the modalities listed in this model's spec
- Workflows where a more recent successor in the same family scores higher