194 datasets found

Formats: JSON

Filter Results
  • UK-PODS

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS features modern conversational Ukrainian language.
  • TIMIT Corpus

    The TIMIT corpus is a large database of speech recordings used for speaker recognition and speech synthesis tasks.
  • TEDLIUM Corpus

    The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization.
  • AVA-Speech

    The AVA-Speech dataset is a publicly available dataset of movies densely labeled with speech activity.
  • A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR

    A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory...
  • TIMIT dataset

    The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
  • TED Gesture

    TED Gesture is a large-scale English-based dataset for co-speech gesture generation, composed of 1766 TED videos from various topics and different narrators.
  • TIMIT

    The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
  • Audio Data

    The dataset contains audio data from various sources, including podcasts, audiobooks, and voice assistants.
  • VFHQ

    The dataset used for testing the AniTalker framework, containing 4,242 unique speaker IDs, 17,108 video clips, and a cumulative duration of 55 hours.
  • AMI

    AMI dataset for speaker diarization and recognition
  • Multi-Scale Octave Convolutions for Robust Speech Recognition

    Multi-scale octave convolutional layers for robust speech recognition
  • LRS2

    The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.
  • LRS3

    The LRS3 dataset is a large-scale dataset for visual speech recognition. It consists of thousands of spoken sentences from TED videos.
  • AMI Meeting Corpus

    The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field...
  • OpenSubtitles dataset

    Open-domain neural dialogue generation (Vinyals and Le, 2015; Sordoni et al., 2015; Li et al., 2016a; Mou et al., 2016; Serban et al., 2016a; Asghar et al., 2016; Mei et al.,...
  • Voice Bank Corpus

    The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers.
  • Sanskrit ASR dataset

    A dataset for Sanskrit ASR
  • वाक् सञ्चयः (/Vāksañcayah ̣/)

    A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit
  • A Crowdsourced Open-Source Kazakh Speech Corpus and Initial Speech Recognitio...

    The Kazakh speech corpus (KSC) contains around 332 hours of transcribed audio comprising over 153,000 utterances spoken by participants from different regions and age groups, as...