Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 2 datasets found Groups: Visual Grounding Filter Results ReferIt Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence. Dataset JSON RefCOCO The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images. Dataset JSON