31 datasets found

Tags: Speech

Filter Results
  • Google LLC dataset

    A dataset of audio recordings used for training and evaluation of the Binaural Angular Separation Network.
  • LibriVox and Freesound datasets

    A combination of LibriVox and Freesound datasets used for training and evaluation.
  • Voxceleb2

    The Voxceleb2 dataset is a large-scale speaker recognition dataset, containing 2442 hours raw speech from 6112 speakers.
  • MUSAN: A Music, Speech, and Noise Corpus

    MUSAN is a Music, Speech, and Noise Corpus.
  • Isolet dataset

    The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
  • How-2 Dataset

    The How-2 dataset contains 2,000h of instructional videos with corresponding text transcripts, video, speech, translations, and summaries.
  • NIST 2004

    The NIST 2004 dataset is used to evaluate the quality of fake samples generated with the Generative Adversarial Networks framework.
  • WSJ0-2mix

    The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
  • Multimodal Categorization Task

    The dataset used in the paper is a multimodal categorization task using image data and speech signals.
  • Database in [28]

    The database in [28] which was used to evaluate SEGAN in [14].
  • TIMIT dataset

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
  • TIMIT

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
  • Vietnamese Speech Dataset for Named Entity Recognition

    The first Vietnamese speech dataset for NER task, and the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art...
  • L2-Arctic

    The dataset used for the task of mispronunciation detection for second language learners.
  • DAC

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • EnCodec

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • Noisy mixtures dataset

    The dataset used in the paper is a selection of 14 noisy mixtures created manually from the Voice Bank speech corpus.
  • Dataset for speech enhancement

    The dataset used in the paper is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
  • Voice Bank speech corpus

    The Voice Bank speech corpus is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
  • VCTK Corpus

    The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
You can also access this registry using the API (see API Docs).