Dataset - LDM

CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Im...

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
- Dataset
ReferIt

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
- Dataset
- JSON
G-Ref

G-Ref is a dataset for referring image segmentation, comprising 104K referring language expressions for around 55K objects in about 27K images.
- Dataset
- JSON
RefCOCO

The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.
- Dataset
- JSON
RefCOCO, RefCOCO+, and RefCOCOg

Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found