17 datasets found

Formats: JSON Tags: audio

Filter Results
  • Bach The Well-Tempered Clavier Book One and Two

    Bach The Well-Tempered Clavier Book One (WTC B1) and Bach The Well-Tempered Clavier Book Two (WTC B2) datasets.
  • Million Song Dataset

    Million Song Dataset is a collection of audio features and metadata for a million contemporary pop songs. Instead of storing any audio, the dataset consists of features derived...
  • AudioMNIST dataset

    The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings.
  • Distress Analysis Interview Corpus Wizard of Oz dataset (DAIC-WOZ)

    The Distress Analysis Interview Corpus Wizard of Oz dataset (DAIC-WOZ) dataset.
  • Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC)

    The Extended Distress Analysis Interview Corpus Wizard of Oz dataset (E-DAIC) corpus presented in the Audio/Visual Emotion Challenge (AVEC) 2019 Challenge.
  • VGGSound

    The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
  • HuBERT Framework

    The dataset used in this paper is a self-supervised audio pre-training framework called HuBERT.
  • Speech Commands

    The Speech Commands dataset consists of 105809 one-second audio recordings of 35 spoken words sampled at 16kHz. The raw speech commands dataset presents audio recordings as a...
  • AudioCaps

    Audio-text retrieval aims at retrieving a target audio clip or caption from a pool of candidates given a query in another modality.
  • Clotho

    Automated audio captioning is a cross-modal translation task for describing the content of audio clips with natural language sentences.
  • ESC-50

    The dataset used for training the CNN in cough detection is composed of various modified audio clips gathered from open-source online sources. Each of these audio files...
  • Librispeech

    The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
  • LibriLight

    The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and...

    The CREMA-D dataset is an audio-visual dataset for emotion recognition task, each video in which consists of both facial and acoustic emotional expressions.
  • VoxCeleb

    Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...
  • VCTK

    Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
  • LibriTTS

    A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
You can also access this registry using the API (see API Docs).