-
RefCOCO+ and RefCOCOg
The RefCOCO+ and RefCOCOg datasets are benchmarks for referring expression comprehension. They contain images of objects and natural language descriptions of the objects. -
Cap3D Objaverse
Cap3D Objaverse is a dataset of 660K 3D-text pairs, created using an automated captioning process. -
Text2Shape
Text2Shape is a dataset of 8,447 table instances and 6,591 chair instances from the ShapeNet dataset, along with 75,344 natural language descriptions.