Dataset - LDM

Conceptual Caption 3M

The Conceptual Caption 3M (CC-3M) dataset is a large-scale image captioning dataset.
- Dataset
- JSON
Class-Conditional Self-Rewarding for Text-to-Image Models

Self-rewarding mechanism for Text-to-Image models, using image captioning methods.
- Dataset
- JSON
BanglaLekhaImageCaptions dataset

The BanglaLekhaImageCaptions dataset is a modified version of the dataset introduced in [24]. It contains 9,154 images with two captions for each image.
- Dataset
- JSON
VQAv2

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the...
- Dataset
- JSON
Conceptual Captions

The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks.
- Dataset
- JSON
Image COCO

The Image COCO 3 dataset’s image caption annotations, where we sample 4 10,000 sentences as training set and another 10,000 as test set.
- Dataset
- JSON
Flickr30k

The Flickr30k dataset is widely utilized for image caption and image-text retrieval tasks, providing a substantial collection of images with associated captions.
- Dataset
- JSON
Conceptual 12m

Conceptual 12m dataset for automatic image captioning
- Dataset
- JSON
Redcaps: Web-curated image-text data created by the people, for the people

A dataset of web-curated image-text data created by the people, for the people.
- Dataset
- JSON
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images

A dataset of Creative-Commons-licensed images, which is used to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2).
- Dataset
- JSON
POPE

The dataset used in this paper is a multimodal large language model (LLaMM) dataset, specifically POPE, which consists of 7 billion parameters and is used for multimodal tasks...
- Dataset
- JSON
LLaVA-1.5

The dataset used in this paper is a multimodal large language model (LLaMA) dataset, specifically LLaVA-1.5, which consists of 7 billion parameters and is used for multimodal...
- Dataset
- JSON
RefCOCO, RefCOCO+, and RefCOCOg

Visual Grounding is a task that aims to locate a target object according to a natural language expression. The dataset used in this paper is RefCOCO, RefCOCO+, and RefCOCOg.
- Dataset
- JSON
COCO Dataset

The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,...
- Dataset
- JSON
LAION

The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.
- Dataset
- JSON
MS-COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON
Microsoft COCO

The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and...
- Dataset
- JSON
COCO

Large scale datasets [18, 17, 27, 6] boosted text conditional image generation quality. However, in some domains it could be difficult to make such datasets and usually it could...
- Dataset
- JSON
MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

39 datasets found