9 datasets found

Filter Results
  • ZeroVL dataset

    The dataset used for training the ZeroVL model, consisting of 14.23M image-text pairs from various domains.
  • DataComp-1B

    The dataset used in the paper is also DataComp-1B, which is a large-scale dataset for training next-generation image-text models.
  • LAION-400M and LAION-5B

    The dataset used in the paper is LAION-400M and LAION-5B, which are large-scale datasets for training next-generation image-text models.
  • ImageNet with Adversarial Text Regions

    The ImageNet with Adversarial Text Regions (ImageNet-Atr) dataset is a new evaluation set built by adding spotting words to the images of ImageNet evaluation sets.
  • YFCC15M-V2

    The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
  • YFCC15M-V1

    The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
  • YFCC15M

    Mid-scale 15M data is a good balance of the training cost and performance. The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
  • BLIP

    The dataset used in the paper is a pre-trained diffusion backbone and a pre-trained vision-language guidance model.
  • CLIP

    The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be...