Image Captioning - Groups

BanglaLekhaImageCaptions dataset

The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image.

Dataset
JSON

Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.

Dataset
JSON

Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.

Dataset
JSON

YFCC100M

The dataset used in the paper is YFCC100M, a large-scale video dataset. The dataset is used for foreground and background patch extraction and object recognition tasks.

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

COCO