41 datasets found

Tags: Audio

Filter Results
  • TIMIT

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
  • Schizophrenia Spectrum Dataset

    The dataset used for this study was collected for a mental health assessment project conducted at the University of Maryland School of Medicine in collaboration with the...
  • CHiME-4

    The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.
  • L2-Arctic

    The dataset used for the task of mispronunciation detection for second language learners.
  • GTZAN

    The GTZAN dataset is a comprehensive collection of 1000 audio tracks, each 30 seconds long, representing ten diverse music genres.
  • LJSpeech-1.1

    The LJSpeech-1.1 dataset is a large-scale speech dataset containing approximately 24 hours of single-speaker speech recorded at 22 050 Hz.
  • DAC

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • EnCodec

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • VCTK Corpus

    The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
  • CSTR VCTK Corpus

    The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
  • ASVspoof 2021 LA

    The ASVspoof 2021 LA dataset is used for testing the generalization capabilities of our model.
  • ASVspoof 2019 LA

    The ASVspoof 2019 LA dataset encompasses three types of speaker representation: d-vector, one-hot embedding, and VAE.
  • LJ Speech Dataset

    The LJ speech dataset is a dataset of speech samples recorded from a single speaker reading passages from 7 non-fiction books.
  • Maestro dataset

    The Maestro dataset contains MIDI and audio files of recorded performances from piano performance competitions.
  • Landscape

    Generating coherent and natural movement is the key challenge in video generation. This research proposes to condense video generation into a problem of motion generation, to...
  • Librispeech

    The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.
  • VoxCeleb

    Speaker verification systems experience significant performance degradation when tasked with short-duration trial recordings. To address this challenge, a multi-scale feature...
  • VCTK

    Voice conversion (VC) is a technique that alters the voice of a source speaker to a target style, such as speaker identity, prosody, and emotion, while keeping the linguistic...
  • VoxCeleb1

    Speaker recognition aims to identify speaker information from input speech. A type of speaker recognition is speaker verification (SV). It determines whether the test speaker's...
  • CASIA

    The CASIA dataset contains palmprint images from over 100 individuals, captured under various lighting conditions.
You can also access this registry using the API (see API Docs).