CLIP

doi:doi:10.57702/mxw4bsuu

CLIP

The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be prohibitively expensive.

BibTex:

@dataset{A_Radford_and_J_W_Kim_and_C_Hallacy_and_A_Ramesh_and_G_Goh_and_S_Agarwal_and_G_Sastry_and_A_Askell_and_P_Mishkin_and_J_Clark_2024,
    abstract = {The CLIP model and its variants are becoming the de facto backbone in many applications. However, training a CLIP model from hundreds of millions of image-text pairs can be prohibitively expensive.},
    author = {A. Radford and J. W. Kim and C. Hallacy and A. Ramesh and G. Goh and S. Agarwal and G. Sastry and A. Askell and P. Mishkin and J. Clark},
    doi = {10.57702/mxw4bsuu},
    institution = {No Organization},
    keyword = {'CLIP', 'CLIP model', 'Generative Models', 'Image Synthesis', 'Instance Segmentation', 'Natural Language Supervision', 'Transferable Visual Models', 'Unsupervised Semantic Segmentation', 'image captioning', 'image generation', 'image-text pairs', 'large-scale dataset', 'pre-training', 'pretraining', 'text-driven image generation', 'transfer learning', 'vision foundation model', 'vision-and-language models', 'visual representation learning'},
    month = {dec},
    publisher = {TIB},
    title = {CLIP},
    url = {https://service.tib.eu/ldmservice/dataset/clip},
    year = {2024}
}