Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Groups: Vision-Language Learning Organizations: No Organization Filter Results CC3M, SBU Captions, Visual Genome, and COCO The dataset used in the paper is a combination of CC3M, SBU Captions, Visual Genome, and COCO. Dataset JSON