Rel3D is a large-scale dataset of human-annotated spatial relations in 3D. It consists of spatial relations situated in synthetic 3D scenes, making it possible to extract rich...
Visual Spatial Reasoning (VSR) is a controlled probing dataset for testing vision-language models' capabilities of recognizing and reasoning about spatial relations in natural...