Speech Recognition - Groups

Google Speech Command Dataset

The Google Speech Command Dataset is a dataset for keyword spotting, which is a task in speech recognition. The dataset contains 12 classes, including 10 keywords and two extra...

Dataset
JSON

wav2vec 2.0

The wav2vec 2.0 dataset is a self-supervised learning dataset for speech recognition tasks.

Dataset
JSON

CHiME-4

The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.

Dataset
JSON

Aurora-4

Aurora-4 dataset is a broadband corpus designed for noisy speech recognition tasks based on the Wall Street Journal (WSJ0) corpus.

Dataset
JSON

Switchboard Corpus

The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time.

Dataset
JSON

Libri-Light

The dataset used in the paper is the Libri-Light dataset, which is a subset of the LibriSpeech dataset. The authors used this dataset to pre-train their proposed dual-mode ASR...

Dataset
JSON

L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.

Dataset
JSON

Masked Acoustic Unit for Mispronunciation Detection and Correction

The proposed method uses the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction.

Dataset
JSON

HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus

The HKUST dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence.

Dataset
JSON

The Wall Street Journal Corpus

The WSJ dataset is a large dataset of speech recordings, each containing a single speaker speaking a sentence.

Dataset
JSON

TIMIT Acoustic-Phonetic Continuous Speech Corpus

The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers.

Dataset
JSON

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment is a data augmentation method for automatic speech recognition, which masks the mel-spectrogram along the time and frequency axes.

Dataset
JSON

MIXSPEECH: DATA AUGMENTATION FOR LOW-RESOURCE AUTOMATIC SPEECH RECOGNITION

MixSpeech is a data augmentation method for automatic speech recognition, which trains an ASR model by taking a weighted combination of two different speech features as the...

Dataset
JSON

Speech Commands Dataset

The dataset used for training the keyword spotting model is the ESC: Dataset for Environmental Sound Classification, and the Speech Commands Dataset.

Dataset
JSON

Google Commands

This dataset has no description

Dataset
JSON

BABEL

The BABEL dataset is a multilingual speech recognition dataset containing over 1,000 hours of speech from 6 languages.

Dataset
JSON

Proprietary Speech Dataset

Proprietary speech dataset consisted of 184 hours of high quality US English speech spoken by 11 female and 10 male speakers.

Dataset
JSON

WSJ

The WSJ corpus is a large vocabulary continuous speech recognition dataset. It contains 36416 sequences, representing around 80 hours of speech.

Dataset
JSON

Speech Commands

The Speech Commands dataset consists of 105809 one-second audio recordings of 35 spoken words sampled at 16kHz. The raw speech commands dataset presents audio recordings as a...

Dataset
JSON

Switchboard dataset

The dataset used in the paper is the Switchboard dataset, which contains telephone conversations.

Dataset
JSON

194 datasets found