-
Conceptual Captions 3.3M
Conceptual Captions 3.3M is a large-scale dataset of image captions, where each image is accompanied by 5 different captions. -
SBU Captioned Photos
The SBU Captioned Photos (SBU) dataset, consisting of 1M images with associated visually relevant captions. -
RSICD dataset
RSICD dataset (Lu et al.) contains remote sensing image captions. -
COCO Captions
Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect.