Multimodal C4 (mmc4)

Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus.

BibTex: