-
MSCOCO 2014 Captions Dataset
The MSCOCO 2014 captions dataset contains 123,293 images, split into a 82,783 image training set and a 40,504 image validation set. Each image is labeled with five... -
MARIO-LAION
The MARIO-LAION dataset is a subset of the LAION-400M dataset, containing 9,194,613 high-quality text images with corresponding captions. -
Graphic Narrative Corpus
The Graphic Narrative Corpus (GNC) is a dataset of annotated comic book pages, representing English-language graphic novels with a variety of styles. -
3DTopia-360K
The 3DTopia-360K dataset is a large-scale 3D object dataset, which is used to train the 3DTopia model. The dataset contains 360K 3D objects with detailed captions. -
MSCOCO dataset
The MSCOCO dataset is a large-scale image captioning dataset, containing 113,287 images with 5,000 validation images and 5,000 test images. The dataset is used for training and... -
ActivityNet Captions
The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...