CogVideo

CogVideo is a large-scale pretrained transformer for text-to-video generation. It is trained on a dataset of 5.4 million captioned videos with a spatial resolution of 160×160.

BibTex: