DEMYSTIFYING CLIP DATA

doi:doi:10.57702/bs9ucoyr

DEMYSTIFYING CLIP DATA

Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative models. We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective. How-ever, CLIP only provides very limited information about its data and how it has been collected, leading to works that aim to reproduce CLIP’s data by filtering with its model parameters. In this work, we intend to reveal CLIP’s data cura-tion approach and in our pursuit of making it open to the community introduce Metadata-Curated Language-Image Pre-training (MetaCLIP). MetaCLIP takes a raw data pool and metadata (derived from CLIP’s concepts) and yields a balanced subset over the metadata distribution.

BibTex:

@dataset{Hu_Xu_and_Saining_Xie_and_Xiaoqing_Ellen_Tan_and_Po-Yao_Huang_and_Russell_Howes_and_Vasu_Sharma_and_Shang-Wen_Li_and_Gargi_Ghosh_and_Luke_Zettlemoyer_and_Christoph_Feichtenhofer_2024,
    abstract = {Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative models. We believe that the main ingredient to the success of CLIP is its data and not the model architecture or pre-training objective. How-ever, CLIP only provides very limited information about its data and how it has been collected, leading to works that aim to reproduce CLIP’s data by filtering with its model parameters. In this work, we intend to reveal CLIP’s data cura-tion approach and in our pursuit of making it open to the community introduce Metadata-Curated Language-Image Pre-training (MetaCLIP). MetaCLIP takes a raw data pool and metadata (derived from CLIP’s concepts) and yields a balanced subset over the metadata distribution.},
    author = {Hu Xu and Saining Xie and Xiaoqing Ellen Tan and Po-Yao Huang and Russell Howes and Vasu Sharma and Shang-Wen Li and Gargi Ghosh and Luke Zettlemoyer and Christoph Feichtenhofer},
    doi = {10.57702/bs9ucoyr},
    institution = {No Organization},
    keyword = {'Contrastive Learning', 'Data Curation', 'Language-Image Pre-training'},
    month = {dec},
    publisher = {TIB},
    title = {DEMYSTIFYING CLIP DATA},
    url = {https://service.tib.eu/ldmservice/dataset/demystifying-clip-data},
    year = {2024}
}