LAION COCO 600M

The dataset used for training the text-to-video model consists of 20 million videos and 600 million images.

BibTex: