CC3M, SBU Captions, Visual Genome, and COCO

The dataset used in the paper is a combination of CC3M, SBU Captions, Visual Genome, and COCO.

BibTex: