RefCOCO, RefCOCO+, and RefCOCOg

Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.

BibTex: