-
Winoground
The Winoground dataset consists of 400 items, each containing two image-caption pairs (I0, C0), (I1, C1). -
DataComp-10M
DataComp-10M is used as a pretraining dataset -
CC3M and CC12M
CC3M and CC12M are used as datasets for training and evaluation