Image-Text Pre-training - Groups

ZeroVL dataset

The dataset used for training the ZeroVL model, consisting of 14.23M image-text pairs from various domains.

Dataset
JSON

DataComp-1B

The dataset used in the paper is also DataComp-1B, which is a large-scale dataset for training next-generation image-text models.

Dataset
JSON

LAION-400M and LAION-5B

The dataset used in the paper is LAION-400M and LAION-5B, which are large-scale datasets for training next-generation image-text models.

Dataset
JSON

ImageNet with Adversarial Text Regions

The ImageNet with Adversarial Text Regions (ImageNet-Atr) dataset is a new evaluation set built by adding spotting words to the images of ImageNet evaluation sets.

Dataset
JSON

YFCC15M-V2

The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.

Dataset
JSON

YFCC15M-V1

The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.

Dataset
JSON

YFCC15M

Mid-scale 15M data is a good balance of the training cost and performance. The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.

Dataset
JSON

BLIP

The dataset used in the paper is a pre-trained diffusion backbone and a pre-trained vision-language guidance model.

Dataset
JSON

CLIP

The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be...

Dataset
JSON

9 datasets found