5 datasets found

Tags: cross-modal retrieval

Filter Results
  • CC14M

    Large-scale image-text dataset for pre-training a collaborative two-stream vision-language model for cross-modal retrieval.
  • CC4M

    Large-scale image-text datasets for pre-training a collaborative two-stream vision-language model for cross-modal retrieval.
  • ALADIN

    The ALADIN dataset is a custom dataset created for the ALADIN paper.
  • Flickr30k

    The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
  • MS-COCO

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
You can also access this registry using the API (see API Docs).