Image Captioning - Groups

COCO Caption dataset

The COCO Caption dataset, containing 330,000 images with five independent human-generated captions each.

Dataset
JSON

SubICap-1k

The SubICap-1k dataset is a smaller version of the MS-COCO dataset, containing 1,000 images with 1,000 captions each.

Dataset
JSON

MSCOCO caption challenge dataset

The MSCOCO caption challenge dataset is a subset of the MSCOCO caption dataset, containing 113,287 training images, 5,000 validation images, and 5,000 test images.

Dataset
JSON

MSCOCO caption dataset

The MSCOCO caption dataset is a large-scale image captioning dataset. It consists of 123,000 images with 5 captions each.

Dataset
JSON

Show-and-Tell

Visual language grounding is widely studied in modern neural networks, which typically adopts an encoder-decoder framework consisting of a convolutional neural network (CNN) for...

Dataset
JSON

Image Captioning and Visual Question Answering

The dataset is used for image captioning and visual question answering.

Dataset
JSON

Flickr 8k Dataset

The Flickr 8k dataset is a large-scale benchmark for image captioning. It contains 8,000 images annotated with 5 human captions each.

Dataset
JSON

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human...

Dataset
JSON

Augmented Flickr-8K Dataset

A dataset of images annotated with captions and semantic tuples, created by training a model to predict semantic tuples from image captions.

Dataset
JSON

Flickr30K and MSCOCO

The dataset used in the paper is Flickr30K and MSCOCO, which are used for image-text matching and image captioning tasks.

Dataset
JSON

Flickr 30k Dataset

The Flickr 30k dataset is a large-scale image captioning dataset containing 30,000 images with 30 captions each.

Dataset
JSON

MSCOCO 2014 Captions Dataset

The MSCOCO 2014 captions dataset contains 123,293 images, split into a 82,783 image training set and a 40,504 image validation set. Each image is labeled with five...

Dataset
JSON

High Quality Image Text Pairs

The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles.