4 datasets found

Tags: Image-Text Pre-training

Filter Results
  • ImageNet with Adversarial Text Regions

    The ImageNet with Adversarial Text Regions (ImageNet-Atr) dataset is a new evaluation set built by adding spotting words to the images of ImageNet evaluation sets.
  • YFCC15M-V2

    The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
  • YFCC15M-V1

    The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
  • YFCC15M

    Mid-scale 15M data is a good balance of the training cost and performance. The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.
You can also access this registry using the API (see API Docs).