-
BURCHAK corpus
A new freely available human-human dialogue data set for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. -
VGDiffZero: Text-to-Image Diffusion Models Can Be Zero-Shot Visual Grounders
VGDiffZero is a zero-shot visual grounding framework that leverages pre-trained text-to-image diffusion models' vision-language alignment abilities.