-
Unsupervised word segmentation and lexicon discovery using acoustic word embe...
A dataset for the Zero Resource Speech Challenge 2015. -
Fixed-dimensional acoustic embeddings of variable-length segments in low-reso...
A dataset for the Zero Resource Speech Challenge 2015. -
The Zero Resource Speech Challenge 2015
A dataset for the Zero Resource Speech Challenge 2015. -
A segmental Bayesian framework for fully-unsupervised large-vocabulary speech...
A segmental Bayesian model for full-coverage segmentation and clustering of conversational speech audio. -
Amazon Alexa Dataset
A 23 thousand hour corpus of untranscribed, de-identified, far-field, English voice command and voice query speech. -
DeepSpeech
The DeepSpeech dataset used for evaluation of the proposed watermarking scheme. -
Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recogn...
The proposed black-box model watermarking framework for protecting the IP of ASR models. -
Query-by-example on-device keyword spotting
Query-by-example on-device keyword spotting. -
KWS-DailyTalk
KWS-DailyTalk is a five-shot KWS dataset aimed at detecting 15 different keywords, namely “afternoon”, “airport”, “cash”, “credit card”, “deposit”, “dollar”, “evening”,... -
Open Subtitles dataset
The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows. -
Loss Prediction: End-to-End Active Learning for Speech Recognition
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the... -
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech...
This paper presents a well-known music identification method and implements it as a neural net. -
WIT3 Parallel Corpus
The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks. -
VoxForge dataset
The VoxForge dataset is a collection of audio recordings of human speech. -
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction o...
Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units -
wav2vec: Unsupervised Pre-Training for Speech Recognition
Unsupervised Pre-Training for Speech Recognition -
Isolet dataset
The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.