Speech Recognition - Groups

Speech Pattern Based Black-Box Model Watermarking for Automatic Speech Recogn...

The proposed black-box model watermarking framework for protecting the IP of ASR models.

Dataset
JSON

Query-by-example on-device keyword spotting

Query-by-example on-device keyword spotting.

Dataset
JSON

DailyTalk

DailyTalk: Spoken dialogue dataset for conversational text-to-speech.

Dataset
JSON

KWS-DailyTalk

KWS-DailyTalk is a five-shot KWS dataset aimed at detecting 15 different keywords, namely “afternoon”, “airport”, “cash”, “credit card”, “deposit”, “dollar”, “evening”,...

Dataset
JSON

Whisper

Whisper is a general-purpose speech recognition model.

Dataset
JSON

Open Subtitles dataset

The Open Subtitles dataset consists of transcriptions of spoken dialog in movies and television shows.

Dataset
JSON

Loss Prediction: End-to-End Active Learning for Speech Recognition

End-to-end speech recognition systems usually require huge amounts of labeling resource, while annotating the speech data is complicated and expensive. Active learning is the...

Dataset
JSON

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech...

This paper presents a well-known music identification method and implements it as a neural net.

Dataset
JSON

LRW

The LRW dataset is an English language lip reading dataset, containing 500 different words, each spoken by over 1,000 persons.

Dataset
JSON

WIT3 Parallel Corpus

The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks.

Dataset
JSON

VoxForge dataset

The VoxForge dataset is a collection of audio recordings of human speech.

Dataset
JSON

HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction o...

Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units

Dataset
JSON

wav2vec: Unsupervised Pre-Training for Speech Recognition

Unsupervised Pre-Training for Speech Recognition

Dataset
JSON

Isolet dataset

The dataset used in this paper is the Isolet dataset, which contains 4,000 13-channel audio recordings of 100 speakers.

Dataset
JSON

Fluent Speech Command dataset

The Fluent Speech Command dataset is a dataset for end-to-end spoken language understanding (SLU) tasks, consisting of single-channel audio clips sampled at 16 kHz.

Dataset
JSON

Hub5e-swb Dataset

The Hub5e-swb dataset is a dataset of speech recordings from a hub5e-swb device, which is a device that allows multiple people to speak at the same time.

Dataset
JSON

AISHELL-1

The AISHELL-1 dataset is a Mandarin speech corpus, consisting of 178 hours of speech, with 11 domains and 400 speakers from different accent areas in China.

Dataset
JSON

SEAME corpus

SEAME corpus is a Mandarin-English code-switching speech corpus.

Dataset
JSON

BABEL-Pashto

The BABEL-Pashto dataset is a multilingual speech recognition dataset containing Pashto speech recordings.

Dataset
JSON

EAT: Enhanced ASR-TTS for Self-Supervised Speech Recognition

Self-supervised ASR-TTS models suffer in out-of-domain data conditions. Here we propose an enhanced ASR-TTS model that incorporates two main features: 1) The ASR→TTS direction...

Dataset
JSON

194 datasets found