YFCC15M

Mid-scale 15M data is a good balance of the training cost and performance. The dataset is used for Contrastive Language-Image Pretraining (CLIP) and its variants.

BibTex: