-
TIMIT Corpus
The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks. -
TEDLIUM Corpus
The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization. -
AVA-Speech
The AVA-Speech dataset is a publicly available dataset of movies densely labeled with speech activity. -
A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR
A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory... -
TIMIT dataset
The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated... -
TED Gesture
TED Gesture is a large-scale English-based dataset for co-speech gesture generation, composed of 1766 TED videos from various topics and different narrators. -
Audio Data
The dataset contains audio data from various sources, including podcasts, audiobooks, and voice assistants. -
Multi-Scale Octave Convolutions for Robust Speech Recognition
Multi-scale octave convolutional layers for robust speech recognition -
AMI Meeting Corpus
The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field... -
OpenSubtitles dataset
Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,... -
Voice Bank Corpus
The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers. -
Sanskrit ASR dataset
A dataset for Sanskrit ASR -
वाक् सञ्चयः (/Vāksañcayah ̣/)
A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit -
A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognitio...
The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as...