-
Image Captioning and Visual Question Answering
The dataset is used for image captioning and visual question answering. -
Flickr 8k Dataset
The Flickr 8k dataset is a large-scale benchmark for image captioning. It contains 8,000 images annotated with 5 human captions each. -
Learning to Evaluate Image Captioning
Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human... -
Microsoft COCO 2014 and 2017
Microsoft COCO 2014 and 2017 datasets for object detection, segmentation, and captioning -
TextCaps: A dataset for image captioning with reading comprehension
TextCaps: A dataset for image captioning with reading comprehension. -
Twitter Alt-Text Dataset
A dataset of 371k images paired with alt-text and tweets scraped from Twitter, used for alt-text generation. -
Crisscrossed Captions
Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model. -
UMIC: An unreferenced metric for image captioning via contrastive learning
UMIC: An unreferenced metric for image captioning via contrastive learning -
Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automa...
Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning. -
Flickr30K-EE
Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to... -
CrowdCaption Dataset
The CrowdCaption dataset contains 11,161 images with 21,794 group region and 43,306 group captions. Each group has an average of 2 captions. -
Conceptual Captions 12M
The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles. -
Conceptual Caption 3M
The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset. -
BanglaLekhaImageCaptions dataset
The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image. -
Conceptual Captions
The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks. -
Conceptual 12m
Conceptual 12m dataset for automatic image captioning -
Redcaps: Web-curated image-text data created by the people, for the people
A dataset of web-curated image-text data created by the people, for the people.