Vision-Language - Groups - LDM

Chinese CLIP

A vision-language pre-training dataset, Chinese CLIP, which consists of 100 million image-text pairs.
- Dataset
- JSON
BLIP2

A vision-language pre-training dataset, BLIP2, which consists of 100 million image-text pairs.
- Dataset
- JSON

Before browse our site, please accept our cookies policy