Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 2 datasets found Tags: multimodal dataset Filter Results Chinese CLIP A vision-language pre-training dataset, Chinese CLIP, which consists of 100 million image-text pairs. Dataset JSON BLIP2 A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs. Dataset JSON