10 datasets found

Tags: visual grounding

Filter Results
  • BURCHAK corpus

    A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.
  • One-stage Visual Grounding

    A fast and accurate one-stage approach to visual grounding
  • InstanceRefer

    Cooperative holistic understanding for visual grounding on point clouds through instance multi-level contextual referring
  • SWiG

    The SWiG dataset is a large-scale visual grounding dataset, where the task is to predict the object in an image.
  • ReferItGame

    Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.
  • Flickr30K Entities

    The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...
  • ReferIt

    Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
  • RefCOCOg

    The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.
  • RefCOCO

    The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.
  • VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders

    VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.
You can also access this registry using the API (see API Docs).