-
Conceptual Caption 3M
The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset. -
Class-Conditional Self-Rewarding for Text-to-Image Models
Self-rewarding mechanism for Text-to-Image models, using image captioning methods. -
BanglaLekhaImageCaptions dataset
The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image. -
Conceptual Captions
The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks. -
Image COCO
The Image COCO 3 dataset’s image caption annotations, where we sample 4 10,000 sentences as training set and another 10,000 as test set. -
Conceptual 12m
Conceptual 12m dataset for automatic image captioning -
Redcaps: Web-curated image-text data created by the people, for the people
A dataset of web-curated image-text data created by the people, for the people. -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
A dataset of Creative-Commons-licensed images, which is used to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). -
RefCOCO, RefCOCO+, and RefCOCOg
Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg. -
COCO Dataset
The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,... -
Microsoft COCO
The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...