COCO 5K

The dataset used in the paper for unpaired vision-language pre-training via cross-modal CutMix.

BibTex: