Dataset - LDM

Bach The Well-Tempered Clavier Book One and Two

Bach The Well-Tempered Clavier Book One (WTC B1) and Bach The Well-Tempered Clavier Book Two (WTC B2) datasets.
- Dataset
- JSON
Million Song Dataset

Million Song Dataset is a collection of audio features and metadata for a million contemporary pop songs. Instead of storing any audio, the dataset consists of features derived...
- Dataset
- JSON
AudioMNIST dataset

The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings.
- Dataset
- JSON
Distress Analysis Interview Corpus Wizard of Oz dataset (DAIC-WOZ)

The Distress Analysis Interview Corpus Wizard of Oz dataset (DAIC-WOZ) dataset.
- Dataset
- JSON
Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC)

The Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge.
- Dataset
- JSON
VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
- Dataset
- JSON
HuBERT Framework

The dataset used in this paper is a self-supervised audio pre-training framework called HuBERT.
- Dataset
- JSON
Speech Commands

The Speech Commands dataset consists of 105809 one-second audio recordings of 35 spoken words sampled at 16kHz. The raw speech commands dataset presents audio recordings as a...
- Dataset
- JSON
AudioCaps

Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.
- Dataset
- JSON
Clotho

Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.
- Dataset
- JSON
ESC-50

The dataset used for training the CNN in cough detection is composed of various modified audio clips gathered from open-source online sources. Each of these audio files...
- Dataset
- JSON
Librispeech

The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
- Dataset
- JSON
LibriLight

The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...
- Dataset
- JSON
CREMA-D

The CREMA-D dataset is an audio-visual dataset for emotion recognition task, each video in which consists of both facial and acoustic emotional expressions.
- Dataset
- JSON
VoxCeleb

Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...
- Dataset
- JSON
VCTK

Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
- Dataset
- JSON
LibriTTS

A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

17 datasets found