4 datasets found

Filter Results
  • RefCOCO+ and RefCOCOg

    The RefCOCO+ and RefCOCOg datasets are benchmarks for referring expression comprehension. They contain images of objects and natural language descriptions of the objects.
  • Cap3D Objaverse

    Cap3D Objaverse is a dataset of 660K 3D-text pairs, created using an automated captioning process.
  • Text2Shape

    Text2Shape is a dataset of 8,447 table instances and 6,591 chair instances from the ShapeNet dataset, along with 75,344 natural language descriptions.
  • ScanRefer

    ScanRefer is a dataset of 51,583 referring descriptions of 11,046 objects from 800 ScanNet scenes.