Training CLIP models on Data from Scientific Papers

doi:doi:10.57702/vaozuc3r

Training CLIP models on Data from Scientific Papers

Contrastive Language-Image Pretraining (CLIP) models are trained with datasets extracted from web crawls, which are of large quantity but limited quality. This paper explores whether limited amounts higher quality data in a specific domain improves the general performance of CLIP models.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Calvin Metzger (2024). Dataset: Training CLIP models on Data from Scientific Papers. https://doi.org/10.57702/vaozuc3r

DOI retrieved: December 2, 2024

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2311.04711
Author	Calvin Metzger
Homepage	https://github.com/nopperl/clip_arxiv_pmc