Chinese CLIP

A vision-language pre-training dataset, Chinese CLIP, which consists of 100 million image-text pairs.

BibTex: