VQGCOCO

VQGCOCO is a dataset consisting of 2500 training images, 1250 validation images, and 1250 test images from MS COCO, each with 5 corresponding questions and 5 ground-truth captions.

BibTex: