-
MSCOCO caption challenge dataset
The MSCOCO caption challenge dataset is a subset of the MSCOCO caption dataset, containing 113,287 training images, 5,000 validation images, and 5,000 test images. -
MSCOCO caption dataset
The MSCOCO caption dataset is a large-scale image captioning dataset. It consists of 123,000 images with 5 captions each. -
Show-and-Tell
Visual language grounding is widely studied in modern neural networks, which typically adopts an encoder-decoder framework consisting of a convolutional neural network (CNN) for... -
Image Captioning and Visual Question Answering
The dataset is used for image captioning and visual question answering. -
Flickr 8k Dataset
The Flickr 8k dataset is a large-scale benchmark for image captioning. It contains 8,000 images annotated with 5 human captions each. -
Learning to Evaluate Image Captioning
Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human... -
Augmented Flickr-8K Dataset
A dataset of images annotated with captions and semantic tuples, created by training a model to predict semantic tuples from image captions. -
Flickr30K and MSCOCO
The dataset used in the paper is Flickr30K and MSCOCO, which are used for image-text matching and image captioning tasks. -
Flickr 30k Dataset
The Flickr 30k dataset is a large-scale image captioning dataset containing 30,000 images with 30 captions each. -
MSCOCO 2014 Captions Dataset
The MSCOCO 2014 captions dataset contains 123,293 images, split into a 82,783 image training set and a 40,504 image validation set. Each image is labeled with five... -
High Quality Image Text Pairs
The High Quality Image Text Pairs (HQITP-134M) dataset consists of 134 million diverse and high-quality images paired with descriptive captions and titles. -
Microsoft COCO 2014 and 2017
Microsoft COCO 2014 and 2017 datasets for object detection, segmentation, and captioning -
COCO dataset (Brazilian Portuguese)
The dataset used for training the Brazilian Portuguese version of the GRIT model, a translation of the COCO dataset. -
Semantic Communication Dataset
The dataset used in this paper for semantic communication, consisting of images and their corresponding captions. -
Pascal Flickr dataset
The Pascal Flickr dataset is a collection of captions for images from Flickr. -
Image Captioning Task
The dataset used in the paper is a image captioning task. -
TextCaps: A dataset for image captioning with reading comprehension
TextCaps: A dataset for image captioning with reading comprehension. -
Remote Sensing Image Captioning
Remote Sensing Image Captioning Dataset (RSICD) and UCM-captions dataset for remote sensing image captioning -
Conceptual Captions 3.3M
Conceptual Captions 3.3M is a large-scale dataset of image captions, where each image is accompanied by 5 different captions.