-
ZeroVL dataset
The dataset used for training the ZeroVL model, consisting of 14.23M image-text pairs from various domains. -
DataComp-1B
The dataset used in the paper is also DataComp-1B, which is a large-scale dataset for training next-generation image-text models. -
LAION-400M and LAION-5B
The dataset used in the paper is LAION-400M and LAION-5B, which are large-scale datasets for training next-generation image-text models. -
ImageNet with Adversarial Text Regions
The ImageNet with Adversarial Text Regions (ImageNet-Atr) dataset is a new evaluation set built by adding spotting words to the images of ImageNet evaluation sets. -
YFCC15M-V2
The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants. -
YFCC15M-V1
The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.