-
Active Words Dataset
The dataset used in the paper is a list of active words at each node of the speech recognition system. It is used to evaluate the performance of the speech recognition system... -
Speech Recognition Dataset
The dataset used in the paper is a speech recognition dataset for a menu-based speech solution. It contains a list of active words at each node of the speech recognition system. -
English and Luganda datasets for ASR-free keyword spotting
South African English and Luganda datasets -
Radio browsing for developmental monitoring in Uganda
Radio browsing system for humanitarian relief -
Feature learning for efficient ASR-free keyword spotting in low-resource lang...
ASR-free keyword spotting in low-resource languages -
WSJ and Switchboard datasets
The 80-hour WSJ and 300-hour Switchboard datasets are used for end-to-end speech recognition. -
Qualcomm Keyword Speech Dataset
The Qualcomm Keyword Speech Dataset consists of 4270 utterances belonging to four classes, with variable durations from 0.48s to 1.92s. -
Very Deep Multilingual Convolutional Neural Networks for LVCSR
Convolutional neural networks (CNNs) are a standard component of many current state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) systems. However, CNNs in... -
Speech commands: A dataset for limited-vocabulary speech recognition
Speech commands: A dataset for limited-vocabulary speech recognition. -
Encoder-Decoder Neural Architecture Optimization for Keyword Spotting
Keyword spotting aims to identify specific keyword audio ut-terances. In recent years, deep convolutional neural networks have been widely utilized in keyword spotting systems. -
CSTR VCTK Corpus
The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances. -
Google Speech Commands Dataset
The Google Speech Commands Dataset contains 64,727 one-second-long utterance files which are recorded and labeled with one of 30 target categories. -
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide... -
CommonVoice
The sequence-to-sequence approach is widely used in speech recognition (SR) nowadays, and many research works are dedicated to show that their capabilities relying on a single... -
FedNST: Federated Noisy Student Training for Automatic Speech Recognition
Federated Noisy Student Training for Automatic Speech Recognition -
VCTK Dataset
The VCTK dataset is a large corpus of speech recordings, each containing a single speaker and a single sentence. -
Switchboard
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. -
TIMIT, Aurora-4, AMI, and LibriSpeech
Four different corpora are used for our experiments, which are TIMIT, Aurora-4, AMI, and LibriSpeech. TIMIT contains broadband 16kHz recordings of phonetically-balanced read... -
BABEL dataset
The dataset used in this paper is the BABEL dataset, which contains 10881 motion sequences, with 65926 subsequences and the corresponding textual labels. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.