-
BURCHAK corpus
A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. -
One-stage Visual Grounding
A fast and accurate one-stage approach to visual grounding -
InstanceRefer
Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring -
ReferItGame
Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color. -
Flickr30K Entities
The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K... -
VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders
VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.