You're currently viewing an old version of this dataset. To see the current version, click here.

CogVideo

CogVideo is a large-scale pretrained transformer for text-to-video generation. It is trained on a dataset of 5.4 million captioned videos with a spatial resolution of 160×160.

Data and Resources

This dataset has no data

Cite this as

Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang (2024). Dataset: CogVideo. https://doi.org/10.57702/elttxmzw

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2205.15868
Author Wenyi Hong
More Authors
Ming Ding
Wendi Zheng
Xinghan Liu
Jie Tang
Homepage https://github.com/THUDM/CogVideo