CogVideo

CogVideo is a large-scale pretrained transformer for text-to-video generation. It is trained on a dataset of 5.4 million captioned videos with a spatial resolution of 160×160.

Data and Resources

Cite this as

Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang (2024). Dataset: CogVideo. https://doi.org/10.57702/elttxmzw

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2205.15868
Author Wenyi Hong
More Authors
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
Homepage https://github.com/THUDM/CogVideo