COCO-stuff and Visual Genome
The dataset used in the paper is COCO-stuff and Visual Genome. COCO-stuff is a dataset of 164K images with pixel-level stuff annotations, and Visual Genome is a dataset of 108,077 images with dense annotations of objects, attributes, and relationships.
BibTex: