-
COCO Captions and Localized Narratives
The dataset used in the paper is COCO captions and Localized Narratives, which are used to generate image descriptions. -
VQA-CP v2 and VQA 2.0
The dataset used in the paper is VQA-CP v2 and VQA 2.0, which are two standard datasets for visual question answering. -
Visual Story-Telling dataset (VIST)
Visual Story-Telling dataset (VIST) is the only publicly accessible dataset for storytelling problems. It comprises 210,819 distinct images that can be found in 10,117 different...