Explore Gadget Wave's Latest Innovations — Headline: Gadget Wave's Cloud Computing Guide

Developing Voice-Activated Software Applications

Nvidia and Mozilla have recently expanded their database of speech data sourced collectively, now totalling 13,905 hours of speech across 76 different languages. The updated version of the dataset includes 182,000 distinct voices along with demographic details such as age, gender, and accent....

, and Administrator

2025 August 26 . 3:43 PM

2 min read

Developing Voice-Activated Software Applications

Updated Crowdsourced Speech Dataset Now Available: 13,905 Hours of Speech in 76 Languages

Nvidia and Mozilla have recently updated a renowned crowdsourced speech dataset, making it one of the world's largest open speech datasets. The updated dataset, available through the Mozilla Common Voice project, contains 13,905 hours of speech in 76 languages and 182,000 unique voices.

This extensive dataset includes demographic information such as age, gender, and accent, making it a valuable resource for developing voice-enabled services and AI models in various languages, including less commonly represented ones. The dataset now includes 16 new languages: Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, and Hausa.

Interested individuals can access the updated dataset by visiting the Mozilla Common Voice website or repository. The data is openly available for research and commercial use under an open license. Mozilla encourages contributors and developers to participate in expanding and improving the dataset.

Additionally, Mozilla provides toolkits, such as those for transcribing audio using open-source Whisper models, to support working with the data securely and privately. For those looking to collaborate or use the dataset for enterprise purposes, Mozilla can provide details on licensing and data access.

Mozilla is also working on an initiative to create a data collective ("marketplace") to facilitate controlled sharing and licensing of curated datasets, which may include this speech dataset in the future.

For more information and to download the dataset, visit the Mozilla Common Voice official site or platform. You can also use Mozilla and EleutherAI toolkits available on platforms like Mozilla.ai Blueprints for accessing or building similar datasets. For collaboration or enterprise use cases, contact Mozilla or relevant entities for licensing and data access details.

The renovated Mozilla Common Voice dataset, with its 13,905 hours of speech in 76 languages, is particularly beneficial for AI and data-and-cloud-computing technology, as it aids in developing voice-enabled services and AI models.
This extensive dataset, containing unique voices from various demographics and languages, is not only open for research purposes but also encourages contributors to enhance and curate it using Mozilla's provided toolkits, potentially making it available in Mozilla's future data collective (marketplace).

Latest

Manufacturing

HMS Astute Returns for Major Overhaul After 15 Years of Global Service

HMS Astute, the first of its class to achieve numerous milestones, is back for a well-deserved refit. The multi-million-pound Mid-Life Revalidation Period will secure the submarine's future and reflect the Royal Navy's commitment to a strong underwater fleet.

, and Administrator

2025 October 9

In the center of the image we can see a man riding on the jet ski. At the bottom there is water. In...

Latest Tech Innovations

Salomon's Speedcross Peak Waterproof Sneaker: Fall 2025's Must-Have

Stay dry and stylish this fall with Salomon's latest. The Speedcross Peak Waterproof sneaker combines performance and fashion at a Prime Day discount.

, and Administrator

2025 October 9

In this picture there is a security person who is holding the papers. In front of him there is...

Fortify Your Gadget World

Rubrik Bolsters Leadership with Top Appointments, Surpasses $400M in ARR

Rubrik strengthens its leadership with high-profile appointments. With over $400M in ARR, it's poised to drive innovation in cybersecurity, especially in the APAC region.

, and Administrator

2025 October 9

This image consists of few persons. They are wearing the army dresses. At the bottom, there is...

Smart-home-devices

Wesel Police Offers Free E-bike & Pedelec Training & Coding This Fall

Boost your riding skills and security with free police-led training and coding for your E-bike or Pedelec. Sessions happening across Wesel this October.

, and Administrator

2025 October 9

Developing Voice-Activated Software Applications

Developing Voice-Activated Software Applications

Read also:

Related

Latest