-
AMI Meeting Corpus
The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field... -
OpenSubtitles dataset
Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,... -
Voice Bank Corpus
The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers. -
Sanskrit ASR dataset
A dataset for Sanskrit ASR -
वाक् सञ्चयः (/Vāksañcayah ̣/)
A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit -
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognitio...
The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as... -
Google Speech Command Dataset
The Google Speech Command Dataset is a dataset for keyword spotting, which is a task in speech recognition. The dataset contains 12 classes, including 10 keywords and two extra... -
wav2vec 2.0
The wav2vec 2.0 dataset is a self-supervised learning dataset for speech recognition tasks. -
Switchboard Corpus
The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time. -
Libri-Light
The dataset used in the paper is the Libri-Light dataset, which is a subset of the LibriSpeech dataset. The authors used this dataset to pre-train their proposed dual-mode ASR... -
Masked Acoustic Unit for Mispronunciation Detection and Correction
The proposed method uses the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction. -
HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus
The HKUST dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence. -
The Wall Street Journal Corpus
The WSJ dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence. -
TIMIT Acoustic-Phonetic Continuous Speech Corpus
The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers. -
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
SpecAugment is a data augmentation method for automatic speech recognition, which masks the mel-spectrogram along the time and frequency axes. -
MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION
MixSpeech is a data augmentation method for automatic speech recognition, which trains an ASR model by taking a weighted combination of two different speech features as the... -
Speech Commands Dataset
The dataset used for training the keyword spotting model is the ESC: Dataset for Environmental Sound Classification, and the Speech Commands Dataset.