-
Open Subtitles dataset
The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows. -
IPA Transcription of Bengali Texts
A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus... -
ATIS dataset
The ATIS dataset is a benchmark dataset for spoken language understanding, consisting of audio recordings and corresponding manual transcripts about humans asking for flight... -
Sanskrit ASR dataset
A dataset for Sanskrit ASR -
वाक् सञ्चयः (/Vāksañcayah ̣/)
A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit -
Google Speech Commands Dataset
The Google Speech Commands Dataset contains 64,727 one-second-long utterance files which are recorded and labeled with one of 30 target categories. -
Switchboard
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment.