Vision-Language - Groups

Chinese CLIP

A vision-language pre-training dataset, Chinese CLIP, which consists of 100 million image-text pairs.
- Dataset
- JSON
BLIP2

A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs.
- Dataset
- JSON
WebLI Dataset

The WebLI dataset used for training and evaluation of the CoBIT model.
- Dataset
- JSON
JFT-4B Dataset

The JFT-4B dataset used for training and evaluation of the CoBIT model.
- Dataset
- JSON
ALIGN Dataset

The ALIGN dataset used for training and evaluation of the CoBIT model.
- Dataset
- JSON
CoBIT Dataset

The dataset used for training and evaluation of the CoBIT model, which consists of image-text pairs from large-scale noisy web-crawled data and image annotation data.
- Dataset
- JSON

6 datasets found