Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) when scaled down to limited computation budgets.

Data and Resources

Cite this as

Zichao Li, Cihang Xie, Ekin Dogus Cubuk (2024). Dataset: Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies. https://doi.org/10.57702/t6hrskc9

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Zichao Li
More Authors
Cihang Xie
Ekin Dogus Cubuk