Image-Text Pretraining - Groups - LDM

Loopitr: Combining dual and cross encoder architectures for image-text retrieval

A method for combining dual and cross encoder architectures for image-text retrieval.
- Dataset
- JSON
RECLIP: Resource-efficient CLIP by Training with Small Images

A simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining).
- Dataset
- JSON
DataCompDR

The dataset used for CLIP pretraining with good quality captions.
- Dataset
- JSON
CC3M, SBU Captions, Visual Genome, and COCO

The dataset used in the paper is a combination of CC3M, SBU Captions, Visual Genome, and COCO.
- Dataset
- JSON

Before browse our site, please accept our cookies policy