Dataset - LDM

MUSAN: A Music, Speech, and Noise Corpus

MUSAN is a Music, Speech, and Noise Corpus.
- Dataset
- JSON
Isolet dataset

The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
- Dataset
- JSON
How-2 Dataset

The How-2 dataset contains 2,000h of instructional videos with corresponding text transcripts, video, speech, translations, and summaries.
- Dataset
- JSON
NIST 2004

The NIST 2004 dataset is used to evaluate the quality of fake samples generated with the Generative Adversarial Networks framework.
- Dataset
- JSON
WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
Multimodal Categorization Task

The dataset used in the paper is a multimodal categorization task using image data and speech signals.
- Dataset
- JSON
Database in [28]

The database in [28] which was used to evaluate SEGAN in [14].
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON
Vietnamese Speech Dataset for Named Entity Recognition

The first Vietnamese speech dataset for NER task, and the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art...
- Dataset
- JSON
L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.
- Dataset
- JSON
DAC

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
EnCodec

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
Noisy mixtures dataset

The dataset used in the paper is a selection of 14 noisy mixtures created manually from the Voice Bank speech corpus.
- Dataset
- JSON
Dataset for speech enhancement

The dataset used in the paper is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
- Dataset
- JSON
Voice Bank speech corpus

The Voice Bank speech corpus is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
- Dataset
- JSON
VCTK Corpus

The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
- Dataset
- JSON
CSTR VCTK Corpus

The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
- Dataset
- JSON
RAVDESS

RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset contains 24 professional actors (12 female, 12 male) to offer the performance with good quality and...
- Dataset
- JSON
EMODB

The EMODB dataset is a German language speech library containing about 535 audio clips, each ranging from 1 to 10 seconds long, covering seven different emotional expressions.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

28 datasets found