-
Microsoft COCO Captions
A large dataset of captions for images. -
Redcaps: Web-curated image-text data created by the people, for the people
A dataset of web-curated image-text data created by the people, for the people. -
CommonCanvas: An Open Diffusion Model Trained with Creative-Commons Images
A dataset of Creative-Commons-licensed images, which is used to train a set of open diffusion models that are qualitatively competitive with Stable Diffusion 2 (SD2). -
Generalizable Entity Grounding via Assistance of Large Language Model
The GELLA framework leverages a large language model to ground entities with long captions. -
COCO Captions
Object detection is a fundamental task in computer vision, requiring large annotated datasets that are difficult to collect. -
Amazon Berkeley Objects Dataset (ABO)
The Amazon Berkeley Objects Dataset (ABO) is a public available e-commerce dataset with multiple images per product. -
MS COCO captions
The MS COCO captions dataset contains captions for images in the Microsoft COCO dataset. -
CUB captions
The CUB captions dataset contains captions for images in the Caltech-UCSD Birds 200 dataset. -
COCO Dataset
The COCO dataset is a large-scale dataset for object detection, semantic segmentation, and captioning. It contains 80 object categories and 1,000 image instances per category,... -
Visual Genome
The Visual Genome dataset is a large-scale visual question answering dataset, containing 1.5 million images, each with 15-30 annotated entities, attributes, and relationships. -
Show and tell: A neural image caption generator
Show and tell: A neural image caption generator. -
From show to tell: A survey on deep learning-based image captioning
From show to tell: A survey on deep learning-based image captioning. -
Microsoft COCO
The Microsoft COCO dataset was used for training and evaluating the CNNs because it has become a standard benchmark for testing algorithms aimed at scene understanding and... -
Self-Supervised Image Captioning with CLIP
Image captioning, a fundamental task in vision-language understanding, seeks to generate accurate natural language descriptions for provided images. Current image captioning...