-
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Im...
Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. -
RefCOCO, RefCOCO+, and RefCOCOg
Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.