Audio Visual Scene-aware Dialog dataset

The Audio Visual Scene-aware Dialog (AVSD) dataset requires systems to generate answers about events observed in a video through previous dialogs.

Data and Resources

Cite this as

C. Hori, H. Alamri, J. Wang, G. Wichern, T. Hori, A. Cherian, T.K. Marks, V. Cartillier, R.G. Lopes, A. Das (2024). Dataset: Audio Visual Scene-aware Dialog dataset. https://doi.org/10.57702/b2ugxqvi

DOI retrieved: November 25, 2024

Additional Info

Field Value
Created November 25, 2024
Last update November 25, 2024
Defined In https://doi.org/10.48550/arXiv.1911.11390
Author C. Hori
More Authors
H. Alamri
J. Wang
G. Wichern
T. Hori
A. Cherian
T.K. Marks
V. Cartillier
R.G. Lopes
A. Das