6 datasets found

Tags: visual grounding

Filter Results
  • BURCHAK corpus

    A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner.
  • SWiG

    The SWiG dataset is a large-scale visual grounding dataset, where the task is to predict the object in an image.
  • ReferIt

    Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
  • RefCOCOg

    The RefCOCOg dataset is a reconstructed dataset of the MS-COCO dataset, containing 85,474 referring expressions for 54,822 objects in 26,711 images.
  • RefCOCO

    The dataset used in the paper is a benchmark for referring expression grounding, containing 142,210 referring expressions for 50,000 referents in 19,994 images.
  • VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders

    VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.