-
TIMIT dataset
The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated... -
TED Gesture
TED Gesture is a large-scale English-based dataset for co-speech gesture generation, composed of 1766 TED videos from various topics and different narrators. -
Google Speech Command Dataset
The Google Speech Command Dataset is a dataset for keyword spotting, which is a task in speech recognition. The dataset contains 12 classes, including 10 keywords and two extra... -
wav2vec 2.0
The wav2vec 2.0 dataset is a self-supervised learning dataset for speech recognition tasks. -
Switchboard Corpus
The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time. -
TIMIT Acoustic-Phonetic Continuous Speech Corpus
The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers. -
Speech Commands Dataset
The dataset used for training the keyword spotting model is the ESC: Dataset for Environmental Sound Classification, and the Speech Commands Dataset. -
LRS2 dataset
The LRS2 dataset consists of news recordings from the BBC, with different lighting, backgrounds, face poses, and people with different origins. -
GRID dataset
The GRID dataset was introduced by [5] as a corpus for tasks such as speech perception and speech recognition. GRID contains 33 unique speakers, articulating 1000 word sequences... -
Speech Commands
The Speech Commands dataset consists of 105809 one-second audio recordings of 35 spoken words sampled at 16kHz. The raw speech commands dataset presents audio recordings as a... -
Switchboard
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers. -
LibriLight
The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...