VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix

Unpaired vision-language pre-training via cross-modal CutMix.

Data and Resources

Cite this as

T. Wang, W. Jiang, Z. Lu, F. Zheng, R. Cheng, C. Yin, P. Luo (2024). Dataset: VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix. https://doi.org/10.57702/nv69pofd

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2312.12334
Author T. Wang
More Authors
W. Jiang
Z. Lu
F. Zheng
R. Cheng
C. Yin
P. Luo