-
Loopitr: Combining dual and cross encoder architectures for image-text retrieval
A method for combining dual and cross encoder architectures for image-text retrieval. -
RECLIP: Resource-efficient CLIP by Training with Small Images
A simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining). -
DataCompDR
The dataset used for CLIP pretraining with good quality captions. -
CC3M, SBU Captions, Visual Genome, and COCO
The dataset used in the paper is a combination of CC3M, SBU Captions, Visual Genome, and COCO.