Speech Recognition - Groups

UK-PODS

This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS features modern conversational Ukrainian language.

Dataset
JSON

TIMIT Corpus

The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.

Dataset
JSON

TEDLIUM Corpus

The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization.

Dataset
JSON

AVA-Speech

The AVA-Speech dataset is a publicly available dataset of movies densely labeled with speech activity.

Dataset
JSON

A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR

A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory...

Dataset
JSON

TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...

Dataset
JSON

TED Gesture

TED Gesture is a large-scale English-based dataset for co-speech gesture generation, composed of 1766 TED videos from various topics and different narrators.

Dataset
JSON

TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...

Dataset
JSON

Audio Data

The dataset contains audio data from various sources, including podcasts, audiobooks, and voice assistants.

Dataset
JSON

VFHQ

The dataset used for testing the AniTalker framework, containing 4,242 unique speaker IDs, 17,108 video clips, and a cumulative duration of 55 hours.

Dataset
JSON

AMI

AMI dataset for speaker diarization and recognition

Dataset
JSON

Multi-Scale Octave Convolutions for Robust Speech Recognition

Multi-scale octave convolutional layers for robust speech recognition

Dataset
JSON

LRS2

The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.

Dataset
JSON

LRS3

The LRS3 dataset is a large-scale dataset for visual speech recognition. It consists of thousands of spoken sentences from TED videos.

Dataset
JSON

AMI Meeting Corpus

The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field...

Dataset
JSON

OpenSubtitles dataset

Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...

Dataset
JSON

Voice Bank Corpus

The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers.

Dataset
JSON

Sanskrit ASR dataset

A dataset for Sanskrit ASR

Dataset
JSON

वाक् सञ्चयः (/Vāksañcayah ̣/)

A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit

Dataset
JSON

A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognitio...

The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as...

Dataset
JSON

194 datasets found