Image Captioning - Groups

Image Captioning and Visual Question Answering

The dataset is used for image captioning and visual question answering.

Dataset
JSON

Flickr 8k Dataset

The Flickr 8k dataset is a large-scale benchmark for image captioning. It contains 8,000 images annotated with 5 human captions each.

Dataset
JSON

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human...

Dataset
JSON

Microsoft COCO 2014 and 2017

Microsoft COCO 2014 and 2017 datasets for object detection, segmentation, and captioning

Dataset
JSON

TextCaps: A dataset for image captioning with reading comprehension

TextCaps: A dataset for image captioning with reading comprehension.

Dataset
JSON

Twitter Alt-Text Dataset

A dataset of 371k images paired with alt-text and tweets scraped from Twitter, used for alt-text generation.

Dataset
JSON

Crisscrossed Captions

Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model.

Dataset
JSON

Mult30K

Mult30K dataset is a multilingual image captioning dataset used for training and evaluation of the M3P model.

Dataset
JSON

UMIC: An unreferenced metric for image captioning via contrastive learning

Dataset
JSON

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automa...

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning.

Dataset
JSON

Flickr30K-EE

Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to...

Dataset
JSON

COCO-EE

Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to...

Dataset
JSON

CrowdCaption Dataset

The CrowdCaption dataset contains 11,161 images with 21,794 group region and 43,306 group captions. Each group has an average of 2 captions.

Dataset
JSON

Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.

Dataset
JSON

Conceptual Caption 3M

The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset.

Dataset
JSON

BanglaLekhaImageCaptions dataset

The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image.

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

Conceptual 12m

Conceptual 12m dataset for automatic image captioning

Dataset
JSON

Redcaps: Web-curated image-text data created by the people, for the people

A dataset of web-curated image-text data created by the people, for the people.

Dataset
JSON

28 datasets found