Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL is a unified model for 3D and language understanding. It operates directly on point clouds and supports diverse 3D vision-language tasks, including semantic segmentation, object detection, instance segmentation, grounded segmentation, captioning, text-3D cross-modality retrieval, and zero-shot 3D object classification.
BibTex: