Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) when scaled down to limited computation budgets.

BibTex: