-
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy...
Multichannel linear filters for speech recognition in noisy environments -
English Broadcast News (BN) dataset
The dataset used in this paper is the English Broadcast News (BN) dataset. -
IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. -
Error Explainable Benchmark (EEB) dataset
The proposed Error Explainable Benchmark (EEB) dataset, which considers both speech- and text-level error types, to diagnose and validate ASR models and post-processors. -
SLR41 and SLR44 datasets
The SLR41 and SLR44 datasets consist of pairs of audio recordings and corresponding transcripts. -
SLR35 and SLR36 datasets
The SLR35 and SLR36 datasets consist of 200,000 speech recordings from native speakers. -
Magic Data
The Magic Data dataset consists of 3.5 hours of Indonesian scripted speeches from 10 people. -
TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets
The study uses the TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets for training and evaluation of the ASR system. -
Speech EEG Database
Two simultaneous speech EEG recording databases for this work. For database A five female and five male subjects took part in the experiment. For database B five male and three... -
LibriLight: A Benchmark for ASR with Limited or No Supervision
The LibriLight dataset is a large-scale speech corpus used for self-supervised speech recognition tasks. -
The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM
The TIMIT Acoustic-Phonetic Continuous Speech Corpus CDROM is a widely used dataset for speech recognition tasks. -
UNSUPERVISED SPEECH RECOGNITION WITH N-SKIPGRAM AND POSITIONAL UNIGRAM MATCHING
Training unsupervised speech recognition systems presents challenges due to GAN-associated instability, misalignment between speech and text, and significant memory demands. To... -
TI-46 Spoken Digits Recognition
The TI-46 spoken digits dataset comprises of 5 speakers uttering 10 times each of the 10 digits (500 samples) -
The ESTER phase II evaluation campaign for the rich transcription of French b...
The ESTER phase II evaluation campaign for the rich transcription of French broadcast news contains news reports. -
Stanford Neural Machine Translation Systems for Spoken Language Domain
Stanford neural machine translation systems for spoken language domain. -
Arabic Digits Dataset
The dataset used in this paper is a dataset for spoken digit recognition of Arabic digits from 0 to 9. -
Amazon Alexa Dataset
A 23 thousand hour corpus of untranscribed, de-identified, far-field, English voice command and voice query speech. -
Open Subtitles dataset
The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows. -
Loss Prediction: End-to-End Active Learning for Speech Recognition
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the...