-
Image Captioning Task
The dataset used in the paper is a image captioning task. -
High Quality Image-Text Pairs (HQITP)
High Quality Image-Text Pairs (HQITP) dataset contains 134M high-quality image-caption pairs. -
ClipCap: CLIP Prefix for Image Captioning
Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image. -
Good News, everyone!
The dataset used in the paper to evaluate the effectiveness of context-driven entity-aware captioning. -
Winoground
The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1). -
CC12M dataset
CC12M dataset is used for training and testing the proposed method. It contains 12 million images with 12 million captions. -
Flickr8K-Expert dataset
Flickr8K-Expert dataset is used for evaluating the proposed method. It contains 8,000 images with 8,000 captions. -
Concept Conjunction 500 (CC-500)
The Concept Conjunction 500 (CC-500) dataset is a benchmark for text-to-image synthesis, consisting of 500 images with 500 corresponding text descriptions. -
Attribute Binding Contrast (ABC-6K)
The Attribute Binding Contrast (ABC-6K) dataset is a benchmark for text-to-image synthesis, consisting of 6,000 images with 6,000 corresponding text descriptions. -
Composite Dataset
The Composite dataset, containing 11,985 human judgments over Flickr 8K, Flickr 30K, and COCO captions. -
Flickr30K Entities
The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K... -
Microsoft COCO: common objects in context
The COCO dataset is a large-scale dataset for object detection and image classification. -
LAION-Improved-Aesthetics (v1.2)
The LAION-Improved-Aesthetics (v1.2) dataset used for training the Stable Diffusion model, which includes images with captions. -
Conceptual Captions 12M
The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles. -
COCO-captions dataset
The COCO-captions dataset contains ∼120k RGB images with text captions. -
MS COCO dataset
The MS COCO dataset is a large benchmark for image captioning, containing 328K images with 5 caption descriptions each.