-
SketchyCOCO
SketchyCOCO: A large-scale scene sketch dataset with fine-grained alignment among sketch, text, and photo. -
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture...