Image Captioning - Groups

Remote Sensing Image Captioning

Remote Sensing Image Captioning Dataset (RSICD) and UCM-captions dataset for remote sensing image captioning

Dataset
JSON

Conceptual Captions 3.3M

Conceptual Captions 3.3M is a large-scale dataset of image captions, where each image is accompanied by 5 different captions.

Dataset
JSON

SBU Captioned Photos

The SBU Captioned Photos (SBU) dataset, consisting of 1M images with associated visually relevant captions.

Dataset
JSON

ClipCap: CLIP Preﬁx for Image Captioning

Image captioning is a fundamental task in vision-language understanding, where the model predicts a textual informative caption to a given input image.

Dataset
JSON

Good News, everyone!

The dataset used in the paper to evaluate the effectiveness of context-driven entity-aware captioning.

Dataset
JSON

Winoground

The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1).

Dataset
JSON

Twitter Alt-Text Dataset

A dataset of 371k images paired with alt-text and tweets scraped from Twitter, used for alt-text generation.

Dataset
JSON

CLIP-DIFFUSION-LM: APPLY DIFFUSION MODEL ON IMAGE CAPTIONING

Image captioning task has been extensively researched by previous work. However, limited experiments focus on generating captions based on non-autoregressive text decoder....

Dataset
JSON

Crisscrossed Captions

Crisscrossed Captions (CxC) dataset is a multimodal learning dataset used for training and evaluation of the MURAL model.

Dataset
JSON

Mult30K

Mult30K dataset is a multilingual image captioning dataset used for training and evaluation of the M3P model.

Dataset
JSON

UMIC: An unreferenced metric for image captioning via contrastive learning

Dataset
JSON

CC12M dataset

CC12M dataset is used for training and testing the proposed method. It contains 12 million images with 12 million captions.

Dataset
JSON

Flickr8K-Expert dataset

Flickr8K-Expert dataset is used for evaluating the proposed method. It contains 8,000 images with 8,000 captions.

Dataset
JSON

Composite Dataset

The Composite dataset, containing 11,985 human judgments over Flickr 8K, Flickr 30K, and COCO captions.

Dataset
JSON

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automa...

Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning.

Dataset
JSON

Flickr30K Entities

The Flickr30K Entities dataset consists of 31,783 images each matched with 5 captions. The dataset links distinct sentence entities to image bounding boxes, resulting in 70K...

Dataset
JSON

Microsoft COCO: common objects in context

The COCO dataset is a large-scale dataset for object detection and image classification.

Dataset
JSON

Flickr30K-EE

Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to...

Dataset
JSON

COCO-EE

Explicit Caption Editing (ECE) — refining reference image captions through a sequence of explicit edit operations (e.g., KEEP, DELETE) — has raised significant attention due to...

Dataset
JSON

CrowdCaption Dataset

The CrowdCaption dataset contains 11,161 images with 21,794 group region and 43,306 group captions. Each group has an average of 2 captions.

Dataset
JSON

85 datasets found