35 datasets found

Tags: Audio

Filter Results
  • DAC

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • EnCodec

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • VCTK Corpus

    The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
  • CSTR VCTK Corpus

    The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
  • ASVspoof 2021 LA

    The ASVspoof 2021 LA dataset is used for testing the generalization capabilities of our model.
  • ASVspoof 2019 LA

    The ASVspoof 2019 LA dataset encompasses three types of speaker representation: d-vector, one-hot embedding, and VAE.
  • LJ Speech Dataset

    The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
  • Maestro dataset

    The Maestro dataset contains MIDI and audio files of recorded performances from piano performance competitions.
  • Landscape

    Generating coherent and natural movement is the key challenge in video generation. This research proposes to condense video generation into a problem of motion generation, to...
  • Librispeech

    The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
  • VoxCeleb

    Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...
  • VCTK

    Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
  • VoxCeleb1

    Speaker recognition aims to identify speaker information from input speech. A type of speaker recognition is speaker verification (SV). It determines whether the test speaker's...
  • CASIA

    The CASIA dataset contains palmprint images from over 100 individuals, captured under various lighting conditions.
  • LibriTTS

    A popular text-based VC approach is to use an automatic speech recognition (ASR) model to extract phonetic posteriorgram (PPG) as content representation.
You can also access this registry using the API (see API Docs).