A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline

The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as well as both genders.

Data and Resources

Cite this as

Yerbolat Khassanov, Saida Mussakhojayeva, Almas Mirzakhmetov, Alen Adiyev, Mukhamet Nurpeiissov, Huseyin Atakan Varol (2024). Dataset: A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognition Baseline. https://doi.org/10.57702/axkv9mmv

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Author Yerbolat Khassanov
More Authors
Saida Mussakhojayeva
Almas Mirzakhmetov
Alen Adiyev
Mukhamet Nurpeiissov
Huseyin Atakan Varol
Homepage https://issai.nu.edu.kz/kz-speech-corpus/?version=1.1