Dataset - LDM

TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON
Schizophrenia Spectrum Dataset

The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the...
- Dataset
- JSON
CHiME-4

The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.
- Dataset
- JSON
L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.
- Dataset
- JSON
GTZAN

The GTZAN dataset is a comprehensive collection of 1000 audio tracks, each 30 seconds long, representing ten diverse music genres.
- Dataset
- JSON
LJSpeech-1.1

The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
- Dataset
- JSON
DAC

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
EnCodec

The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
- Dataset
- JSON
VCTK Corpus

The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
- Dataset
- JSON
CSTR VCTK Corpus

The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
- Dataset
- JSON
ASVspoof 2021 LA

The ASVspoof 2021 LA dataset is used for testing the generalization capabilities of our model.
- Dataset
- JSON
ASVspoof 2019 LA

The ASVspoof 2019 LA dataset encompasses three types of speaker representation: d-vector, one-hot embedding, and VAE.
- Dataset
- JSON
LJ Speech Dataset

The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
- Dataset
- JSON
Maestro dataset

The Maestro dataset contains MIDI and audio files of recorded performances from piano performance competitions.
- Dataset
- JSON
Landscape

Generating coherent and natural movement is the key challenge in video generation. This research proposes to condense video generation into a problem of motion generation, to...
- Dataset
- JSON
Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
- Dataset
- JSON
VoxCeleb

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...
- Dataset
- JSON
VCTK

Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
- Dataset
- JSON
VoxCeleb1

Speaker recognition aims to identify speaker information from input speech. A type of speaker recognition is speaker verification (SV). It determines whether the test speaker's...
- Dataset
- JSON
CASIA

The CASIA dataset contains palmprint images from over 100 individuals, captured under various lighting conditions.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

41 datasets found