-
Fixed-dimensional acoustic embeddings of variable-length segments in low-reso...
A dataset for the Zero Resource Speech Challenge 2015. -
The Zero Resource Speech Challenge 2015
A dataset for the Zero Resource Speech Challenge 2015. -
KWS-DailyTalk
KWS-DailyTalk is a five-shot KWS dataset aimed at detecting 15 different keywords, namely “afternoon”, “airport”, “cash”, “credit card”, “deposit”, “dollar”, “evening”,... -
WIT3 Parallel Corpus
The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks. -
VoxForge dataset
The VoxForge dataset is a collection of audio recordings of human speech. -
Isolet dataset
The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers. -
Hub5e-swb Dataset
The Hub5e-swb dataset is a dataset of speech recordings from a hub5e-swb device, which is a device that allows multiple people to speak at the same time. -
Resource Management Audio-Visual (RMAV) dataset
The RMAV dataset consists of 20 British English speakers up to 200 utterances per speaker of the Resource Management (RM) sentences. -
AVLetters-2 (AVL2) dataset
The AVL2 dataset consists of seven utterances per speaker reciting the alphabet. -
End-to-End Neural Speaker Diarization with Permutation-Free Objectives
The End-to-End Neural Speaker Diarization dataset is a benchmark for speaker diarization. -
The Third DIHARD Diarization Challenge
The DIHARD dataset is a benchmark for speaker diarization. -
Perception of Phonological Assimilation
The dataset used in this study consists of 48 stimuli, each containing a word pair with a place assimilation, and a carrier sentence. The stimuli are designed to test the... -
FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge
Speech recognition systems driven by Deep Neural Networks (DNNs) have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our... -
Corpus of Spoken Dutch
The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings. -
Language Models of Spoken Dutch
The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch. -
NIST RT-03 English CTS
The dataset is used for speaker diarization tasks. -
AudioMNIST dataset
The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings. -
People’s Speech
The People’s Speech: A large-scale diverse English speech recognition dataset for commercial usage.