-
LAION-Improved-Aesthetics (v1.2)
The LAION-Improved-Aesthetics (v1.2) dataset used for training the Stable Diffusion model, which includes images with captions. -
Language models are few-shot learners
A language model that demonstrates capabilities in processing and generating human-like text. -
Prompt Highlighter
Prompt Highlighter is a novel paradigm for user-model interactions in multi-modal LLMs, offering output control through a token-level highlighting mechanism. -
Conceptual Captions 12M
The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles. -
Conceptual Caption 3M
The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset. -
COCO-captions dataset
The COCO-captions dataset contains ∼120k RGB images with text captions. -
BanglaLekhaImageCaptions dataset
The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image. -
MS COCO dataset
The MS COCO dataset is a large benchmark for image captioning, containing 328K images with 5 caption descriptions each. -
Split MS-COCO
The dataset used in the paper is the Split MS-COCO dataset, which is a comprehensive framework for continual image captioning. -
Conceptual Captions 12M and RedCaps
The dataset used in the paper is Conceptual Captions 12M (CC12M) and RedCaps. -
Conceptual Captions 3M, Conceptual Captions 12M, RedCaps, and LAION-400M
The dataset used in the paper is Conceptual Captions 3M (CC3M), Conceptual Captions 12M (CC12M), RedCaps, and LAION-400M. -
Conceptual Captions
The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks. -
Conceptual Captions 3M
The Conceptual Captions 3M dataset is a large-scale image-text dataset used for vision-language pre-training.