Dataset - LDM

Conceptual Captions 3.3M

Conceptual Captions 3.3M is a large-scale dataset of image captions, where each image is accompanied by 5 different captions.
- Dataset
- JSON
SBU Captioned Photos

The SBU Captioned Photos (SBU) dataset, consisting of 1M images with associated visually relevant captions.
- Dataset
- JSON
RSICD dataset

RSICD dataset (Lu et al.) contains remote sensing image captions.
- Dataset
- JSON
CC12M

Conceptual captions dataset used in the paper
- Dataset
- JSON
COCO Captions

Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

5 datasets found