Skip to content

Speech recognition model, Voxtral, rolled out by Mistral

More affordable solution comes into play, challenging pricier alternatives under the Apache license

New speech recognition model, Voxtral, unveiled by Mistral
New speech recognition model, Voxtral, unveiled by Mistral

Speech recognition model, Voxtral, rolled out by Mistral

### Voxtral Takes the Lead in ASR Market with Cost-Effective and Advanced Features

In a significant move, Mistral, a leading tech company, has released an open-source automatic speech recognition (ASR) software bundle called Voxtral. This new offering aims to disrupt the ASR market by providing state-of-the-art accuracy, native semantic understanding, and multilingual capabilities at a lower cost compared to its rivals.

#### Features Comparison

Voxtral's impressive features set it apart from other ASR solutions. According to Mistral, Voxtral outperforms Whisper large-v3, the current leading open-source Speech Transcription model, in terms of accuracy and error rate [2][4]. Voxtral also boasts native semantic understanding, offering built-in question-answering and summarization capabilities [5]. Furthermore, Voxtral is natively multilingual, automatically detecting widely used languages such as English, Spanish, French, Portuguese, Hindi, German, Dutch, Italian, and more [5]. In contrast, Whisper supports multiple languages but may require additional setup, while other APIs like ElevenLabs Scribe can be less efficient than Voxtral [5].

#### Pricing Comparison

Voxtral offers a significant cost advantage over its competitors. Priced at $0.001 per minute, with a maximum price of about $0.004 per minute, Voxtral is more affordable than OpenAI's Whisper, which is fixed at $0.006 per minute [2]. Voxtral also provides transcription at $0.006 per minute for OpenAI's Whisper model and $0.003 per minute for its gpt-4o-mini-transcribe model [2]. This cost savings makes Voxtral a compelling choice for businesses looking to reduce costs while maintaining high-quality speech recognition.

#### Performance Comparison

Voxtral's performance is unmatched in the ASR market. It surpasses ElevenLabs Scribe and demonstrates strong multilingual capabilities [2][5]. Voxtral also beats GPT-4o mini Transcribe and Gemini 2.5 Flash across all tasks [2]. Additionally, Mistral claims that Voxtral comprehensively outperforms Whisper large-v3 in transcription accuracy [2].

#### Looking Forward

Mistral is hoping businesses will pay to use its ASR technology for their applications. The company is offering to help companies set up Voxtral for production-scale inference in private infrastructure and to help tune models for industry-specific applications [1]. Mistral is also looking for potential partners who can provide additional functionality like speaker identification or emotion detection in model deployments.

In conclusion, Voxtral offers a powerful and cost-effective solution for businesses in need of high-quality speech recognition and understanding. Its open-source nature, advanced features, and significant cost savings make it a compelling choice for applications requiring accurate and efficient speech transcription.

The new open-source ASR software bundle, Voxtral, developed by Mistral, leverages artificial-intelligence technology to provide advanced speech recognition features and yet remains cost-effective compared to its competitors. Voxtral's multilingual capabilities and native semantic understanding, including built-in question-answering and summarization abilities, set it apart from other ASR solutions in the technology market.

Read also:

    Latest