Image Captioning - Groups

Microsoft COCO Captions

A large dataset of captions for images.

Dataset
JSON

Redcaps: Web-curated image-text data created by the people, for the people

A dataset of web-curated image-text data created by the people, for the people.

Dataset
JSON

CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

A dataset of Creative-Commons-licensed images, which is used to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2).

Dataset
JSON

POPE

The dataset used in this paper is a multimodal large language model (LLaMM) dataset, specifically POPE, which consists of 7 billion parameters and is used for multimodal tasks...

Dataset
JSON

LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...

Dataset
JSON

YFCC100M

The dataset used in the paper is YFCC100M, a large-scale video dataset. The dataset is used for foreground and background patch extraction and object recognition tasks.

Dataset
JSON

CC12M

Conceptual captions dataset used in the paper

Dataset
JSON

Generalizable Entity Grounding via Assistance of Large Language Model

The GELLA framework leverages a large language model to ground entities with long captions.

Dataset
JSON

COCO Captions

Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect.

Dataset
JSON

Amazon Berkeley Objects Dataset (ABO)

The Amazon Berkeley Objects Dataset (ABO) is a public available e-commerce dataset with multiple images per product.

Dataset
JSON

MS COCO captions

The MS COCO captions dataset contains captions for images in the Microsoft COCO dataset.

Dataset
JSON

CUB captions

The CUB captions dataset contains captions for images in the Caltech-UCSD Birds 200 dataset.

Dataset
JSON

COCO Dataset

The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,...

Dataset
JSON

Visual Genome

The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships.

Dataset
JSON

MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

Show and tell: A neural image caption generator

Show and tell: A neural image caption generator.

Dataset
JSON

From show to tell: A survey on deep learning-based image captioning

From show to tell: A survey on deep learning-based image captioning.

Dataset
JSON

Microsoft COCO

The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...

Dataset
JSON

Self-Supervised Image Captioning with CLIP

Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning...

Dataset
JSON

COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...

Dataset
JSON

83 datasets found