Training CLIP models on Data from Scientific Papers

Contrastive Language-Image Pretraining (CLIP) models are trained with datasets extracted from web crawls, which are of large quantity but limited quality. This paper explores whether limited amounts higher quality data in a specific domain improves the general performance of CLIP models.

BibTex: