You're currently viewing an old version of this dataset. To see the current version, click here.

RANKCLIP: Ranking-Consistent Language-Image Pretraining

Self-supervised contrastive learning models, such as CLIP, have set new benchmarks for vision-language models in many downstream tasks. However, their dependency on rigid one-to-one mappings overlooks the complex and often multifaceted relationships between and within texts and images. To this end, we introduce RANKCLIP, a novel pretraining method that extends beyond the rigid one-to-one matching framework of CLIP and its variants.

Data and Resources

This dataset has no data

Cite this as

Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zhili Feng, Zenghui Ding, Yining Sun (2024). Dataset: RANKCLIP: Ranking-Consistent Language-Image Pretraining. https://doi.org/10.57702/tk8iqkz1

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2404.09387
Author Yiming Zhang
More Authors
Zhuokai Zhao
Zhaorun Chen
Zhili Feng
Zenghui Ding
Yining Sun
Homepage https://github.com/Jam1ezhang/RankCLIP