-
MUSAN: A Music, Speech, and Noise Corpus
MUSAN is a Music, Speech, and Noise Corpus. -
Isolet dataset
The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers. -
How-2 Dataset
The How-2 dataset contains 2,000h of instructional videos with corresponding text transcripts, video, speech, translations, and summaries. -
Multimodal Categorization Task
The dataset used in the paper is a multimodal categorization task using image data and speech signals. -
Database in [28]
The database in [28] which was used to evaluate SEGAN in [14]. -
TIMIT dataset
The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated... -
Vietnamese Speech Dataset for Named Entity Recognition
The first Vietnamese speech dataset for NER task, and the first pre-trained public large-scale monolingual language model for Vietnamese that achieved the new state-of-the-art... -
Noisy mixtures dataset
The dataset used in the paper is a selection of 14 noisy mixtures created manually from the Voice Bank speech corpus. -
Dataset for speech enhancement
The dataset used in the paper is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean... -
Voice Bank speech corpus
The Voice Bank speech corpus is a selection of ten British English speakers – both male and female – from the Voice Bank speech corpus, each of which has around 400 clean... -
VCTK Corpus
The VCTK corpus is an English multi-speaker dataset, with 44 hours of audio spoken by 109 native English speakers. -
CSTR VCTK Corpus
The CSTR VCTK Corpus is a dataset of speech recordings of 109 speakers, each with 20 utterances.