-
AudioMNIST dataset
The dataset used in the paper is the AudioMNIST dataset, which contains 30,000 audio recordings. -
Acoustic AVSpeech
The Acoustic AVSpeech dataset is a benchmark for visual acoustic matching. -
SoundSpaces-Speech
The SoundSpaces-Speech dataset is a benchmark for visual acoustic matching. -
AVA-Speech
The AVA-Speech dataset is a publicly available dataset of movies densely labeled with speech activity. -
A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR
A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory... -
LJSpeech Dataset
The LJSpeech dataset is a collection of audio recordings of a single female speaker reading aloud. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers. -
LibriLight
The dataset used in this paper is a large-scale production ASR system, which includes multi-domain (MD) data sets in English. The MD data sets include medium-form (MF) and... -
Google Speech Commands Dataset Version II
The Google Speech Commands Dataset Version II contains 105,829 utterances of 35 words from 2,618 speakers with a sampling rate of 16 kHz.