Dataset - LDM

WSJ0-2mix

The dataset used in the paper is the WSJ0-2mix dataset, which contains 30 hours of training data and 10 hours of validation data generated from the WSJ0 dataset. The speech...
- Dataset
- JSON
CHiME-2

The CHiME-2 dataset is a speech separation and recognition challenge dataset. It contains 7138 utterances of 8 speakers, each with 10 seconds of speech.
- Dataset
- JSON
LRS2

The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.
- Dataset
- JSON
LRS3

The LRS3 dataset is a large-scale dataset for visual speech recognition. It consists of thousands of spoken sentences from TED videos.
- Dataset
- JSON
Generative Pre-Training for Speech

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate...
- Dataset
- JSON
WHAM!

The WHAM! dataset is used for testing the proposed Bayesian factorised speaker-environment adaptive training and test time adaptation approach for Conformer models.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

6 datasets found