Dataset - LDM

BURCHAK corpus

A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.
- Dataset
- JSON
One-stage Visual Grounding

A fast and accurate one-stage approach to visual grounding
- Dataset
- JSON
InstanceRefer

Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring
- Dataset
- JSON
SWiG

The SWiG dataset is a large-scale visual grounding dataset, where the task is to predict the object in an image.
- Dataset
- JSON
ReferItGame

Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.
- Dataset
- JSON
Flickr30K Entities

The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
- Dataset
- JSON
ReferIt

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
- Dataset
- JSON
RefCOCOg

The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.
- Dataset
- JSON
RefCOCO

The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.
- Dataset
- JSON
VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders

VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

10 datasets found