28 datasets found

Formats: JSON Tags: Speech

Filter Results
  • MUSAN: A Music, Speech, and Noise Corpus

    MUSAN is a Music, Speech, and Noise Corpus.
  • Isolet dataset

    The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.
  • How-2 Dataset

    The How-2 dataset contains 2,000h of instructional videos with corresponding text transcripts, video, speech, translations, and summaries.
  • NIST 2004

    The NIST 2004 dataset is used to evaluate the quality of fake samples generated with the Generative Adversarial Networks framework.
  • WSJ0-2mix

    The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
  • Multimodal Categorization Task

    The dataset used in the paper is a multimodal categorization task using image data and speech signals.
  • Database in [28]

    The database in [28] which was used to evaluate SEGAN in [14].
  • TIMIT dataset

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
  • TIMIT

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
  • Vietnamese Speech Dataset for Named Entity Recognition

    The first Vietnamese speech dataset for NER task, and the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art...
  • L2-Arctic

    The dataset used for the task of mispronunciation detection for second language learners.
  • DAC

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • EnCodec

    The dataset used in this paper is a speech dataset, which is used for training and testing the proposed LaDiffCodec model.
  • Noisy mixtures dataset

    The dataset used in the paper is a selection of 14 noisy mixtures created manually from the Voice Bank speech corpus.
  • Dataset for speech enhancement

    The dataset used in the paper is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
  • Voice Bank speech corpus

    The Voice Bank speech corpus is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean...
  • VCTK Corpus

    The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers.
  • CSTR VCTK Corpus

    The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.
  • RAVDESS

    RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset contains 24 professional actors (12 female, 12 male) to offer the performance with good quality and...
  • EMODB

    The EMODB dataset is a German language speech library containing about 535 audio clips, each ranging from 1 to 10 seconds long, covering seven different emotional expressions.
You can also access this registry using the API (see API Docs).