-
Hierarchical Conditional End-to-End ASR with CTC and Multi-Granular Subword U...
The proposed hierarchical conditional model of end-to-end ASR. The model is trained by gradually increasing the subword units for CTC losses applied to intermediate layers. -
IPA Transcription of Bengali Texts
A comprehensive study of IPA transcription issues and challenges for Bangla, a novel IPA transcription framework, a DUAL-IPA, a sentence level ipa transcripted parallel corpus... -
OpenSeq2Seq
The OpenSeq2Seq dataset is a speech recognition dataset used in the OpenSeq2Seq framework. -
Kaldi Speech Recognition Toolkit
The Kaldi Speech Recognition Toolkit is a widely used dataset for speech recognition. -
WAV2LETTER++
The dataset used in this paper is not explicitly mentioned, but it is implied to be a speech recognition dataset. -
Corpus of Spoken Dutch
The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings. -
Language Models of Spoken Dutch
The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch. -
GigaSpeech
GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio. -
FASTINJECT: Injecting Unpaired Text Data into CTC-Based ASR Training
This paper proposes a flat-start joint training method, named FastInject, to inject unpaired text data into CTC-based ASR training. -
TED-LIUM 3
TED-LIUM 3 (TL3) is a TED talks dataset. Speaker adaptation data for TL3 was divided randomly, where 2/5 was divided into the train set, 1/5 was divided into the dev set, and... -
Speaker Anonymization using X-Vector and Neural Waveform Models
Speaker anonymization using x-vector and neural waveform models. -
NIST RT-03 English CTS
The dataset is used for speaker diarization tasks. -
HYPOTHESIS STITCHER FOR END-TO-END SPEAKER-ATTRIBUTED ASR ON LONG-FORM MULTI-...
An end-to-end (E2E) speaker-attributed automatic speech recognition (SA-ASR) model was proposed recently to jointly perform speaker counting, speech recognition and speaker... -
BD-4SK-ASR
The dataset used in this paper is BD-4SK-ASR, an experimental dataset which is used in the first attempt in developing an ASR system for Sorani Kurdish. -
VoiceHome-2
The dataset used in this paper is VoiceHome-2, an extended corpus for multichannel speech processing in real homes.