AUDIO MODEL Mistral AI Last updated:

Voxtral TTS (v26.03) NEW

Voxtral TTS (v26.03) is an API model from Mistral AI. It’s positioned for audio tasks—work that benefits from iteration, not just one-shot answers.

Core Capabilities

Audio
Speech, music, or other audio understanding/synthesis.
Generative
Produces images, video, audio, or other media.

Context Window

Context window not disclosed.

Availability

API
Available
Product / App
Not available
Open Source
Not released
Enterprise
Contact sales

Pricing Model

Pay per token
Input and output billed separately.
Pay-per-token

What it feels like

  • Audio model from Mistral AI — see the linked sources below for benchmark and review coverage
  • Audio synthesis or transcription per the published model card

Best use cases

  • Audio synthesis / transcription tasks per the model card
  • See the model spec and sources block for benchmarked use cases

Tools to try

Not ideal for

  • Tasks far outside the modalities listed in this model's spec
  • Workflows where a more recent successor in the same family scores higher