-
Multimodal C4 (mmc4)
Multimodal C4 (mmc4) is a public, billion-scale corpus of images and text, constructed from public webpages contained in the cleaned English c4 corpus. -
DialogCC: Large-Scale Multi-Modal Dialogue Dataset
A large-scale multi-modal dialogue dataset created by leveraging the automatic pipeline with filtering using CLIP similarity. -
Conceptual Captions
The dataset used in the paper "Scaling Laws of Synthetic Images for Model Training". The dataset is used for supervised image classification and zero-shot classification tasks. -
Youtube-8M
Youtube-8M is a large-scale video classification benchmark.