Uni3DL: Unified Model for 3D and Language Understanding

Uni3DL is a unified model for 3D and language understanding. It operates directly on point clouds and supports diverse 3D vision-language tasks, including semantic segmentation, object detection, instance segmentation, grounded segmentation, captioning, text-3D cross-modality retrieval, and zero-shot 3D object classification.

Data and Resources

Cite this as

Xiang Li, Jian Ding, Zhaoyang Chen, Mohamed Elhoseiny (2024). Dataset: Uni3DL: Unified Model for 3D and Language Understanding. https://doi.org/10.57702/qhrzeun0

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2312.03026
Author Xiang Li
More Authors
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
Homepage https://uni3dl.github.io/