4 datasets found

Tags: vision and language

Filter Results
  • Conceptual Captions

    The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
  • SBU Captions

    The SBU Captions dataset is a large-scale image-text dataset used for vision-language pre-training.
  • Visual Genome

    The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.
  • MS-COCO

    Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
You can also access this registry using the API (see API Docs).