Audio Visual Scene-aware Dialog dataset

The Audio Visual Scene-aware Dialog (AVSD) dataset requires systems to generate answers about events observed in a video through previous dialogs.

BibTex: