Visual Grounding - Groups

BURCHAK corpus

A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.

Dataset
JSON

SWiG

The SWiG dataset is a large-scale visual grounding dataset, where the task is to predict the object in an image.

Dataset
JSON

ReferIt

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.

Dataset
JSON

RefCOCOg

The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.

Dataset
JSON

RefCOCO

The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.

Dataset
JSON

VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders

VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.

Dataset
JSON