AVSD dataset

doi:doi:10.57702/izroz79p

AVSD dataset

The AVSD dataset is a benchmark for audio-visual scene-aware dialog. It consists of 7659 training, 734 prototype validation, and 733 prototype testing dialog, where the Questioner has access to the first, middle, and last static frames of the video, while the Answerer has access to the entire video, including the audio stream and the original input descriptions.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Ye Zhu, Yu Wu, Yi Yang, Yan Yan (2025). Dataset: AVSD dataset. https://doi.org/10.57702/izroz79p

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.1905.02442
Citation	https://doi.org/10.48550/arXiv.2106.14069
Author	Ye Zhu
More Authors	Yu Wu Yi Yang Yan Yan
Homepage	https://arxiv.org/abs/1904.09635