-
National Diet Library Dataset
A dataset containing 10,000 digitally archived images from various genres. -
Image–Text Pair Dataset from Books
A dataset constructed from book images using an optical character reader (OCR), an object detector, and a layout analyzer for the autonomous extraction of image–text pairs. -
High Quality Image-Text Pairs (HQITP)
High Quality Image-Text Pairs (HQITP) dataset contains 134M high-quality image-caption pairs. -
ZeroVL dataset
The dataset used for training the ZeroVL model, consisting of 14.23M image-text pairs from various domains. -
MARIO-OpenLibrary
The MARIO-OpenLibrary dataset is a subset of the LAION-400M dataset, containing 523,684 book covers with corresponding titles. -
MARIO-TMDB
The MARIO-TMDB dataset is a subset of the LAION-400M dataset, containing 343,423 English posters with corresponding titles. -
MARIO-LAION
The MARIO-LAION dataset is a subset of the LAION-400M dataset, containing 9,194,613 high-quality text images with corresponding captions. -
Multimodal Learning (MLM) dataset
The MLM dataset is a collection of images and captions that represent different cultures from around the world. -
RAMM: Retrieval-augmented Biomedical Visual Question Answering
A retrieval-augmented pretrain-and-finetune paradigm for biomedical VQA which includes a high-quality image-text pairs PMCPM, a pre-trained multi-modal model, and a novel... -
General-context dataset
General-context dataset containing diverse image-text pairs (top three rows), and DVP presented images with targeted translation of the RoI (bottom two rows). -
LAION-Face
The LAION-Face dataset consists of 50 million image-text pairs to ensure diversity. -
Visual Spatial Reasoning
Visual Spatial Reasoning (VSR) is a controlled probing dataset for testing vision-language models' capabilities of recognizing and reasoning about spatial relations in natural...