ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models

ART•V is an efficient framework for auto-regressive video generation with diffusion models. It generates a single frame at a time, conditioned on the previous ones.

Data and Resources

Cite this as

Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong (2024). Dataset: ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models. https://doi.org/10.57702/qtfigx6j

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2311.18834
Author Wenming Weng
More Authors
Ruoyu Feng
Yanhui Wang
Qi Dai
Chunyu Wang
Dacheng Yin
Zhiyuan Zhao
Kai Qiu
Jianmin Bao
Yuhui Yuan
Chong Luo
Yueyi Zhang
Zhiwei Xiong
Homepage https://warranweng.github.io/art.v