You're currently viewing an old version of this dataset. To see the current version, click here.

CogVideo

CogVideo is a large-scale pretrained transformer for text-to-video generation. It is trained on a dataset of 5.4 million captioned videos with a spatial resolution of 160×160.

Data and Resources

This dataset has no data

Cite this as

Wenyi Hong, Ming Ding, Wendi Zheng, Xinghan Liu, Jie Tang (2024). Dataset: CogVideo. https://doi.org/10.57702/elttxmzw

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2205.15868
Author	Wenyi Hong
More Authors	Ming Ding Wendi Zheng Xinghan Liu Jie Tang
Homepage	https://github.com/THUDM/CogVideo