VisSpeech

The dataset used for the audio-visual speech recognition task, which consists of instructional videos with semantically related visual content.

BibTex: