Developing Voice-Activated Software Applications
Updated Crowdsourced Speech Dataset Now Available: 13,905 Hours of Speech in 76 Languages
Nvidia and Mozilla have recently updated a renowned crowdsourced speech dataset, making it one of the world's largest open speech datasets. The updated dataset, available through the Mozilla Common Voice project, contains 13,905 hours of speech in 76 languages and 182,000 unique voices.
This extensive dataset includes demographic information such as age, gender, and accent, making it a valuable resource for developing voice-enabled services and AI models in various languages, including less commonly represented ones. The dataset now includes 16 new languages: Basaa, Slovak, Northern Kurdish, Bulgarian, Kazakh, Bashkir, Galician, Uyghur, Armenian, Belarusian, Urdu, Guarani, Serbian, Uzbek, Azerbaijani, and Hausa.
Interested individuals can access the updated dataset by visiting the Mozilla Common Voice website or repository. The data is openly available for research and commercial use under an open license. Mozilla encourages contributors and developers to participate in expanding and improving the dataset.
Additionally, Mozilla provides toolkits, such as those for transcribing audio using open-source Whisper models, to support working with the data securely and privately. For those looking to collaborate or use the dataset for enterprise purposes, Mozilla can provide details on licensing and data access.
Mozilla is also working on an initiative to create a data collective ("marketplace") to facilitate controlled sharing and licensing of curated datasets, which may include this speech dataset in the future.
For more information and to download the dataset, visit the Mozilla Common Voice official site or platform. You can also use Mozilla and EleutherAI toolkits available on platforms like Mozilla.ai Blueprints for accessing or building similar datasets. For collaboration or enterprise use cases, contact Mozilla or relevant entities for licensing and data access details.
Read also:
- Amazon customer duped over Nvidia RTX 5070 Ti purchase: shipped item replaced with suspicious white powder; PC hardware fan deceived, discovers salt instead of GPU core days after receiving defective RTX 5090.
- Twitter profile activity of user 'peng' shows a significant increase in Hong Kong, amidst preparations for the fourth-quarter launch of an extended-range Twitter profile feature
- GPS Tracking System Unveiled by RoGO Communications for Wildland Firefighting Operations
- 17 Tech Gadgets and Add-Ons Permanently Taking Up Space in My Mental Realm