FLIP: A Method for Reducing Computation in Contrastive Language-Image Pre-training

This paper proposes a method called FLIP, which masks half or more patches of the training images to reduce computation by 2x and allow for the use of larger batch sizes.

BibTex: