Dataset - LDM

Google LLC dataset

A dataset of audio recordings used for training and evaluation of the Binaural Angular Separation Network.
- Dataset
- JSON
LibriVox and Freesound datasets

A combination of LibriVox and Freesound datasets used for training and evaluation.
- Dataset
- JSON
Voxceleb2

The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
- Dataset
- JSON
MUSAN: A Music, Speech, and Noise Corpus

MUSAN is a Music, Speech, and Noise Corpus.
- Dataset
- JSON
Isolet dataset

The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
- Dataset
- JSON
How-2 Dataset

The How-2 dataset contains 2,000h of instructional videos with corresponding text transcripts, video, speech, translations, and summaries.
- Dataset
- JSON
NIST 2004

The NIST 2004 dataset is used to evaluate the quality of fake samples generated with the Generative Adversarial Networks framework.
- Dataset
- JSON
WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
Multimodal Categorization Task

The dataset used in the paper is a multimodal categorization task using image data and speech signals.
- Dataset
- JSON
Database in [28]

The database in [28] which was used to evaluate SEGAN in [14].
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON
Vietnamese Speech Dataset for Named Entity Recognition

The first Vietnamese speech dataset for NER task, and the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art...
- Dataset
- JSON
L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.
- Dataset
- JSON
DAC

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
EnCodec

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
Noisy mixtures dataset

The dataset used in the paper is a selection of 14 noisy mixtures created manually from the Voice Bank speech corpus.
- Dataset
- JSON
Dataset for speech enhancement

The dataset used in the paper is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
- Dataset
- JSON
Voice Bank speech corpus

The Voice Bank speech corpus is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
- Dataset
- JSON
VCTK Corpus

The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

31 datasets found