-
FLIP: A Method for Reducing Computation in Contrastive Language-Image Pre-tra...
This paper proposes a method called FLIP, which masks half or more patches of the training images to reduce computation by 2x and allow for the use of larger batch sizes. -
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Trai...
This paper investigates the performance of the Contrastive Language-Image Pre-training (CLIP) when scaled down to limited computation budgets.