-
DIRHA English WSJ
The DIRHA English WSJ dataset is part of the DIRHA project, which focuses on speech interaction in domestic scenes via distant microphones. -
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Re...
The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed... -
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy...
Multichannel linear filters for speech recognition in noisy environments -
Realistic Multi-Microphone Data Simulation for Distant Speech Recognition
Realistic multi-microphone data simulation for distant speech recognition -
The ICSI Meeting Corpus
The ICSI Meeting Corpus -
Corpus of Spontaneous Japanese
The Corpus of Spontaneous Japanese: Its design and evaluation [30] is a dataset of spontaneous Japanese speech. -
English Broadcast News (BN) dataset
The dataset used in this paper is the English Broadcast News (BN) dataset. -
IMPROVEMENTS TO DEEP CONVOLUTIONAL NEURAL NETWORKS FOR LVCSR
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. -
Text-Level Error Type Classification Criteria
The proposed text-level error type classification criteria, which considers 13 text-level errors that can occur in speech recognition situations. -
Speech-Level Error Type Classification Criteria
The proposed speech-level error type classification criteria, which considers 24 sub-types for noise error and 13 sub-types for speaker characteristics. -
Error Explainable Benchmark (EEB) dataset
The proposed Error Explainable Benchmark (EEB) dataset, which considers both speech- and text-level error types, to diagnose and validate ASR models and post-processors. -
SLR41 and SLR44 datasets
The SLR41 and SLR44 datasets consist of pairs of audio recordings and corresponding transcripts. -
SLR35 and SLR36 datasets
The SLR35 and SLR36 datasets consist of 200,000 speech recordings from native speakers. -
Magic Data
The Magic Data dataset consists of 3.5 hours of Indonesian scripted speeches from 10 people. -
TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets
The study uses the TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets for training and evaluation of the ASR system. -
Airbus dataset
The Airbus dataset contains transcripts of ATC speech from the Vienna airport together with surveil-lance call-signs for each transcript. -
Malorca dataset
The Malorca dataset consists of transcripts of ATC speech from the Vienna airport together with surveil-lance call-signs for each transcript. The LiveATC dataset contains...