The DIRHA English WSJ dataset is part of the DIRHA project, which focuses on speech interaction in domestic scenes via distant microphones. -
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Re...
The multi-stream paradigm of audio processing, in which several sources are simultaneously considered, has been an active research area for information fusion. The proposed... -
Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy...
Multichannel linear filters for speech recognition in noisy environments -
Realistic Multi-Microphone Data Simulation for Distant Speech Recognition
Realistic multi-microphone data simulation for distant speech recognition -
The ICSI Meeting Corpus
The ICSI Meeting Corpus -
Corpus of Spontaneous Japanese
The Corpus of Spontaneous Japanese: Its design and evaluation [30] is a dataset of spontaneous Japanese speech. -
English Broadcast News (BN) dataset
The dataset used in this paper is the English Broadcast News (BN) dataset. -
Deep Convolutional Neural Networks (CNNs) are more powerful than Deep Neural Networks (DNN), as they are able to better reduce spectral variation in the input signal. -
Text-Level Error Type Classification Criteria
The proposed text-level error type classification criteria, which considers 13 text-level errors that can occur in speech recognition situations. -
Speech-Level Error Type Classification Criteria
The proposed speech-level error type classification criteria, which considers 24 sub-types for noise error and 13 sub-types for speaker characteristics. -
Error Explainable Benchmark (EEB) dataset
The proposed Error Explainable Benchmark (EEB) dataset, which considers both speech- and text-level error types, to diagnose and validate ASR models and post-processors. -
SLR41 and SLR44 datasets
The SLR41 and SLR44 datasets consist of pairs of audio recordings and corresponding transcripts. -
SLR35 and SLR36 datasets
The SLR35 and SLR36 datasets consist of 200,000 speech recordings from native speakers. -
Magic Data
The Magic Data dataset consists of 3.5 hours of Indonesian scripted speeches from 10 people. -
TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets
The study uses the TITML-IDN, Magic Data, Common Voice, SLR35, SLR36, SLR41, and SLR44 datasets for training and evaluation of the ASR system. -
Airbus dataset
The Airbus dataset contains transcripts of ATC speech from the Vienna airport together with surveil-lance call-signs for each transcript. -
Malorca dataset
The Malorca dataset consists of transcripts of ATC speech from the Vienna airport together with surveil-lance call-signs for each transcript. The LiveATC dataset contains...