8 datasets found

Tags: Captioning

Filter Results
  • Cap3D Objaverse

    Cap3D Objaverse is a dataset of 660K 3D-text pairs, created using an automated captioning process.
  • Uni3DL: Unified Model for 3D and Language Understanding

    Uni3DL is a unified model for 3D and language understanding. It operates directly on point clouds and supports diverse 3D vision-language tasks, including semantic segmentation,...
  • WavCaps

    The WavCaps dataset contains chatGPT-assisted weakly-labeled audio captioning data.
  • Multi30k

    The Multi30k dataset is an extension of the Flickr30k dataset, containing 29,000 train images, 1,014 validation images and 1,000 test images. Each image is accompanied with six...
  • Flickr30k

    The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
  • MSVD

    Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
  • MSR-VTT

    The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...
  • MSCOCO

    Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
You can also access this registry using the API (see API Docs).