You're currently viewing an old version of this dataset. To see the current version, click here.

VisSpeech

The dataset used for the audio-visual speech recognition task, which consists of instructional videos with semantically related visual content.

Data and Resources

This dataset has no data

Puyuan Peng, Brian Yan, Shinji Watanabe, David Harwath (2025). Dataset: VisSpeech. https://doi.org/10.57702/ct0blch5

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.2305.11095
Author	Puyuan Peng
More Authors	Brian Yan Shinji Watanabe David Harwath