6 datasets found

Tags: vision-language

Filter Results
  • Chinese CLIP

    A vision-language pre-training dataset, Chinese CLIP, which consists of 100 million image-text pairs.
  • BLIP2

    A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs.
  • WebLI Dataset

    The WebLI dataset used for training and evaluation of the CoBIT model.
  • JFT-4B Dataset

    The JFT-4B dataset used for training and evaluation of the CoBIT model.
  • ALIGN Dataset

    The ALIGN dataset used for training and evaluation of the CoBIT model.
  • CoBIT Dataset

    The dataset used for training and evaluation of the CoBIT model, which consists of image-text pairs from large-scale noisy web-crawled data and image annotation data.