Dataset - LDM

LRS2

The LRS2 dataset consists of 48,164 video clips from outdoor shows on BBC television. Each video is accompanied by an audio corresponding to a sentence with up to 100 characters.
- Dataset
- JSON
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Fa...

The proposed Deep Visual Forced Alignment (DVFA) for time-aligning the input transcription with the input talking face video without using speech audio.
- Dataset
- JSON
LRS3

The LRS3 dataset is a large-scale dataset for visual speech recognition. It consists of thousands of spoken sentences from TED videos.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found