Image Captioning - Groups

Mmbench

Mmbench: Is your multi-modal model an all-around player?
- Dataset
- JSON
Language models are few-shot learners

A language model that demonstrates capabilities in processing and generating human-like text.
- Dataset
- JSON
Mmicl

Mmicl: Empowering vision-language model with multi-modal in-context learning
- Dataset
- JSON
Prompt Highlighter

Prompt Highlighter is a novel paradigm for user-model interactions in multi-modal LLMs, offering output control through a token-level highlighting mechanism.
- Dataset
- JSON
Conceptual Captions 12M

The Conceptual Captions 12M (CC-12M) dataset consists of 12 million diverse and high-quality images paired with descriptive captions and titles.
- Dataset
- JSON
Conceptual Caption 3M

The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset.
- Dataset
- JSON
COCO-captions dataset

The COCO-captions dataset contains ∼120k RGB images with text captions.
- Dataset
- JSON
SS1M

The dataset used in the paper for text-only image captioning with synthetic pairs.
- Dataset
- JSON
BanglaLekhaImageCaptions dataset

The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image.
- Dataset
- JSON
MS COCO dataset

The MS COCO dataset is a large benchmark for image captioning, containing 328K images with 5 caption descriptions each.
- Dataset
- JSON
ReferIt

Referring image segmentation aims at localizing all pixels of the visual objects described by a natural language sentence.
- Dataset
- JSON
GeneIC

GeneIC uses CUB-200 and Oxford-102 datasets for training and testing.
- Dataset
- JSON
Split MS-COCO

The dataset used in the paper is the Split MS-COCO dataset, which is a comprehensive framework for continual image captioning.
- Dataset
- JSON
Conceptual Captions 12M and RedCaps

The dataset used in the paper is Conceptual Captions 12M (CC12M) and RedCaps.
- Dataset
- JSON
Conceptual Captions 3M, Conceptual Captions 12M, RedCaps, and LAION-400M

The dataset used in the paper is Conceptual Captions 3M (CC3M), Conceptual Captions 12M (CC12M), RedCaps, and LAION-400M.
- Dataset
- JSON
Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
- Dataset
- JSON
Conceptual Captions 3M

The Conceptual Captions 3M dataset is a large-scale image-text dataset used for vision-language pre-training.
- Dataset
- JSON
Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
- Dataset
- JSON
MSCOCO dataset

The MSCOCO dataset is a large-scale image captioning dataset, containing 113,287 images with 5,000 validation images and 5,000 test images. The dataset is used for training and...
- Dataset
- JSON
Conceptual 12m

Conceptual 12m dataset for automatic image captioning
- Dataset
- JSON

83 datasets found