You're currently viewing an old version of this dataset. To see the current version, click here.

VisSpeech

The dataset used for the audio-visual speech recognition task, which consists of instructional videos with semantically related visual content.

Data and Resources

This dataset has no data

Cite this as

Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath (2025). Dataset: VisSpeech. https://doi.org/10.57702/ct0blch5

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2305.11095
Author Puyuan Peng
More Authors
Brian Yan
Shinji Watanabe
David Harwath