A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

doi:doi:10.57702/axkv9mmv

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol (2024). Dataset: A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline. https://doi.org/10.57702/axkv9mmv

DOI retrieved: December 3, 2024

Additional Info

Field	Value
Created	December 3, 2024
Last update	December 3, 2024
Author	Yerbolat Khassanov
More Authors	Saida Mussakhojayeva Almas Mirzakhmetov Alen Adiyev Mukhamet Nurpeiissov Huseyin Atakan Varol
Homepage	https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1