Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 3 datasets found Tags: image-text pairs Filter Results ZeroVL dataset The dataset used for training the ZeroVL model, consisting of 14.23M image-text pairs from various domains. Dataset JSON BLIP The dataset used in the paper is a pre-trained diffusion backbone and a pre-trained vision-language guidance model. Dataset JSON CLIP The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be... Dataset JSON