-
Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recogn...
The proposed black-box model watermarking framework for protecting the IP of ASR models. -
Query-by-example on-device keyword spotting
Query-by-example on-device keyword spotting. -
KWS-DailyTalk
KWS-DailyTalk is a five-shot KWS dataset aimed at detecting 15 different keywords, namely “afternoon”, “airport”, “cash”, “credit card”, “deposit”, “dollar”, “evening”,... -
Open Subtitles dataset
The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows. -
Loss Prediction: End-to-End Active Learning for Speech Recognition
End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the... -
Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech...
This paper presents a well-known music identification method and implements it as a neural net. -
WIT3 Parallel Corpus
The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks. -
VoxForge dataset
The VoxForge dataset is a collection of audio recordings of human speech. -
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction o...
Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units -
wav2vec: Unsupervised Pre-Training for Speech Recognition
Unsupervised Pre-Training for Speech Recognition -
Isolet dataset
The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers. -
Fluent Speech Command dataset
The Fluent Speech Command dataset is a dataset for end-to-end spoken language understanding (SLU) tasks, consisting of single-channel audio clips sampled at 16 kHz. -
Hub5e-swb Dataset
The Hub5e-swb dataset is a dataset of speech recordings from a hub5e-swb device, which is a device that allows multiple people to speak at the same time. -
SEAME corpus
SEAME corpus is a Mandarin-English code-switching speech corpus. -
BABEL-Pashto
The BABEL-Pashto dataset is a multilingual speech recognition dataset containing Pashto speech recordings. -
EAT: Enhanced ASR-TTS for Self-Supervised Speech Recognition
Self-supervised ASR-TTS models suffer in out-of-domain data conditions. Here we propose an enhanced ASR-TTS model that incorporates two main features: 1) The ASR→TTS direction...