Dataset - LDM

Cap3D Objaverse

Cap3D Objaverse is a dataset of 660K 3D-text pairs, created using an automated captioning process.
- Dataset
- JSON
Uni3DL: Unified Model for 3D and Language Understanding

Uni3DL is a unified model for 3D and language understanding. It operates directly on point clouds and supports diverse 3D vision-language tasks, including semantic segmentation,...
- Dataset
- JSON
WavCaps

The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
- Dataset
- JSON
Multi30k

The Multi30k dataset is an extension of the Flickr30k dataset, containing 29,000 train images, 1,014 validation images and 1,000 test images. Each image is accompanied with six...
- Dataset
- JSON
Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
- Dataset
- JSON
MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
- Dataset
- JSON
MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
- Dataset
- JSON
MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

8 datasets found