Dataset - LDM

Freesound Dataset

The Freesound dataset consists of 18,873 audio files, each assigned one of the 41 unique audio events from the Google's Audioset Ontology.
- Dataset
- JSON
COVID-19 Identiﬁcation ResNet (CIdeR)

The COVID-19 Identiﬁcation ResNet (CIdeR) dataset consists of 517 crowdsourced coughing and breathing audio recordings from 355 participants, of which 62 participants had tested...
- Dataset
- JSON
VoiceBank DEMAND dataset

Speech enhancement dataset
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
VCTK Corpus

The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
- Dataset
- JSON
Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
- Dataset
- JSON
VCTK

Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
- Dataset
- JSON
LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

8 datasets found