Image Captioning - Groups

CC12M dataset

CC12M dataset is used for training and testing the proposed method. It contains 12 million images with 12 million captions.

Dataset
JSON

Flickr8K-Expert dataset

Flickr8K-Expert dataset is used for evaluating the proposed method. It contains 8,000 images with 8,000 captions.

Dataset
JSON

Conceptual Captions 12M and RedCaps

The dataset used in the paper is Conceptual Captions 12M (CC12M) and RedCaps.

Dataset
JSON

Conceptual Captions 3M, Conceptual Captions 12M, RedCaps, and LAION-400M

The dataset used in the paper is Conceptual Captions 3M (CC3M), Conceptual Captions 12M (CC12M), RedCaps, and LAION-400M.

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

MSCOCO dataset

The MSCOCO dataset is a large-scale image captioning dataset, containing 113,287 images with 5,000 validation images and 5,000 test images. The dataset is used for training and...

Dataset
JSON

Microsoft COCO Captions

A large dataset of captions for images.

Dataset
JSON

YFCC100M

The dataset used in the paper is YFCC100M, a large-scale video dataset. The dataset is used for foreground and background patch extraction and object recognition tasks.

Dataset
JSON

COCO Dataset

The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,...

Dataset
JSON

COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...

Dataset
JSON

12 datasets found