Visual Grounding - Groups

BURCHAK corpus

A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.

Dataset
JSON

3DVG-Transformer

A dataset for visual grounding on point clouds, focusing on relation modeling.

Dataset
JSON

SWiG

The SWiG dataset is a large-scale visual grounding dataset, where the task is to predict the object in an image.

Dataset
JSON

ReferIt

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.

Dataset
JSON

SpeechCLIP

SpeechCLIP is a novel framework to integrate speech SSL models with a pre-trained vision and language model.

Dataset
JSON

RefCOCOg

The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.

Dataset
JSON

RefCOCO

The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.

Dataset
JSON

VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders

VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.

Dataset
JSON

RefCOCO, RefCOCO+, and RefCOCOg

Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

10 datasets found