Figure Helix
Humanoid VLA
Figure's vision-language-action model for the Figure 02 and 03 humanoid robots. Helix uses a two-system architecture — System 1 handles fast motor control at 200Hz, System 2 reasons about longer-horizon goals. Released February 2025, with Helix 02 refresh later in the year.
Why it matters
Demonstrated fast-loop (200Hz) VLA control — a regime that cloud-hosted models like Gemini Robotics can't match for latency. Established the two-system control pattern that other humanoid teams have begun adopting.
Core Capabilities
Agent Workflows
Built for tool use and autonomous tasks.
Multimodal
Combines text, vision, and audio in one model.
Generative
Produces images, video, audio, or other media.
Vision
Understands images, scenes, and visual context.
Context Window
Context window not disclosed.
Availability
API
Not available
Product / App
Available
Open Source
Not released
Enterprise
Contact sales
Pricing Model
Subscription
Bundled inside the host product.
Subscription What it feels like
- Vision-language model from Figure AI — see the linked sources below for benchmark and review coverage
- Tool-use and agent loops are the typical fit per the published model card
- Vision and multimodal tasks are the typical fit per the published model card
Best use cases
- Agent / tool-use workflows that match the model's published benchmarks
- Vision tasks (charts, documents, images) per the model card
- See the model spec and sources block for benchmarked use cases
Not ideal for
- Tasks far outside the modalities listed in this model's spec
- Workflows where a more recent successor in the same family scores higher