-
COCO Captions and Localized Narratives
The dataset used in the paper is COCO captions and Localized Narratives, which are used to generate image descriptions. -
VQA-CP v2 and VQA 2.0
The dataset used in the paper is VQA-CP v2 and VQA 2.0, which are two standard datasets for visual question answering.