Dataset - LDM

TEDLIUM Corpus

The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization.
- Dataset
- JSON
A Hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR

A hybrid CNN-BiLSTM VOICE ACTIVITY DETECTOR for voice activity detection (VAD) incorporating both convolutional neural network (CNN) and bidirectional long short-term memory...
- Dataset
- JSON
TIMIT dataset

The dataset used in this paper is a collection of phonetically and phonologically local allophonic distribution in English, where voiceless stops surface as aspirated...
- Dataset
- JSON
TIMIT

The TIMIT corpus is a widely used benchmark for speech recognition tasks. It contains 3,696 training utterances from 462 speakers, excluding the SA sentences. The core test set...
- Dataset
- JSON
VFHQ

The dataset used for testing the AniTalker framework, containing 4,242 unique speaker IDs, 17,108 video clips, and a cumulative duration of 55 hours.
- Dataset
- JSON
AMI

AMI dataset for speaker diarization and recognition
- Dataset
- JSON
Multi-Scale Octave Convolutions for Robust Speech Recognition

Multi-scale octave convolutional layers for robust speech recognition
- Dataset
- JSON
LRS2

The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.
- Dataset
- JSON
AMI Meeting Corpus

The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field...
- Dataset
- JSON
Voice Bank Corpus

The Voice Bank Corpus is a large regional accent speech database containing over 10 hours of speech data from 20 speakers.
- Dataset
- JSON
Sanskrit ASR dataset

A dataset for Sanskrit ASR
- Dataset
- JSON
वाक् सञ्चयः (/Vāksañcayah ̣/)

A new Sanskrit speech corpus and a large-vocabulary ASR system for Sanskrit
- Dataset
- JSON
Google Speech Command Dataset

The Google Speech Command Dataset is a dataset for keyword spotting, which is a task in speech recognition. The dataset contains 12 classes, including 10 keywords and two extra...
- Dataset
- JSON
CHiME-4

The CHiME-4 dataset is a large-scale speech recognition dataset, containing over 2 hours of speech from 6 channels.
- Dataset
- JSON
Aurora-4

Aurora-4 dataset is a broadband corpus designed for noisy speech recognition tasks based on the Wall Street Journal (WSJ0) corpus.
- Dataset
- JSON
Switchboard Corpus

The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time.
- Dataset
- JSON
Libri-Light

The dataset used in the paper is the Libri-Light dataset, which is a subset of the LibriSpeech dataset. The authors used this dataset to pre-train their proposed dual-mode ASR...
- Dataset
- JSON
L2-Arctic

The dataset used for the task of mispronunciation detection for second language learners.
- Dataset
- JSON
Masked Acoustic Unit for Mispronunciation Detection and Correction

The proposed method uses the acoustic unit (AU) as the intermediary feature for both mispronunciation detection and correction.
- Dataset
- JSON
TIMIT Acoustic-Phonetic Continuous Speech Corpus

The TIMIT acoustic-phonetic continuous speech corpusCD-ROM contains a large collection of speech samples from 250 male and 250 female speakers.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

97 datasets found