You're currently viewing an old version of this dataset. To see the current version, click here.

Training CLIP models on Data from Scientific Papers

Contrastive Language-Image Pretraining (CLIP) models are trained with datasets extracted from web crawls, which are of large quantity but limited quality. This paper explores whether limited amounts higher quality data in a specific domain improves the general performance of CLIP models.

Data and Resources

This dataset has no data

Cite this as

Calvin Metzger (2024). Dataset: Training CLIP models on Data from Scientific Papers. https://doi.org/10.57702/vaozuc3r

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2311.04711
Author	Calvin Metzger
Homepage	https://github.com/nopperl/clip_arxiv_pmc