Dataset - LDM

AccentNet

The accented English speech recognition challenge 2020: Open datasets, tracks, baselines, results and methods.
- Dataset
- JSON
GigaSpeech

GigaSpeech: An evolving, multi-domain ASR corpus with 10,000 hours of transcribed audio.
- Dataset
- JSON
TED-LIUM 3

TED-LIUM 3 (TL3) is a TED talks dataset. Speaker adaptation data for TL3 was divided randomly, where 2/5 was divided into the train set, 1/5 was divided into the dev set, and...
- Dataset
- JSON
ASRU 2019 Mandarin-English code-switching speech recognition challenge

The ASRU 2019 Mandarin-English code-switching speech recognition challenge dataset.
- Dataset
- JSON
Wall Street Journal

The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.
- Dataset
- JSON
Video Corpus

A corpus of free and representative video content was gathered. This corpus includes videos having progressive scanning, 1280x720 resolution, and framerates between 24-30 frames...
- Dataset
- JSON
Convolutional Neural Networks for Speech Recognition

The Speech Recognition dataset is used for speech recognition tasks.
- Dataset
- JSON
Data set B

The dataset used for performing continuous speech recognition experiments using EEG features.
- Dataset
- JSON
Data set A and B

The dataset used for performing isolated and continuous speech recognition experiments using EEG features.
- Dataset
- JSON
LibriSpeech: An ASR Corpus Based on Public Domain Audio Books

LibriSpeech: an ASR corpus based on public domain audio books.
- Dataset
- JSON
LIBRIHEAVY: A 50,000 HOURS ASR CORPUS WITH PUNCTUATION CASING AND CONTEXT

Libriheavy is a large-scale ASR corpus consisting of 50,000 hours of read English speech derived from LibriVox. To the best of our knowledge, Libriheavy is the largest...
- Dataset
- JSON
CHiME-2

The CHiME-2 dataset is a speech separation and recognition challenge dataset. It contains 7138 utterances of 8 speakers, each with 10 seconds of speech.
- Dataset
- JSON
MHINT

The MHINT corpus is a Mandarin Chinese speech corpus used for speech recognition and speech enhancement. It contains 480 utterances of 10 speakers, each with 10 seconds of speech.
- Dataset
- JSON
DeepMine

DeepMine is a Persian speech corpus.
- Dataset
- JSON
Google Speech Command

Google Speech Command (GSC) is a dataset for limited-vocabulary speech recognition.
- Dataset
- JSON
ATIS dataset

The ATIS dataset is a benchmark dataset for spoken language understanding, consisting of audio recordings and corresponding manual transcripts about humans asking for flight...
- Dataset
- JSON
UK-PODS-ALIGN

This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS-ALIGN is a dataset that features modern conversational...
- Dataset
- JSON
MCV-10

This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset MCV-10 is a multilingual dataset that contains 50 hours of...
- Dataset
- JSON
UK-PODS

This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS features modern conversational Ukrainian language.
- Dataset
- JSON
TIMIT Corpus

The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

97 datasets found