-
MultiStudioBench
The MultiStudioBench dataset contains 25 subjects, including objects, animals, etc., and there are few images for each subject. Images in the dataset are from previous works or... -
WebVid-10M: A large-scale video dataset for text-to-video generation
WebVid-10M: A large-scale video dataset for text-to-video generation. -
ART•V: Auto-Regressive Text-to-Video Generation with Diffusion Models
ART•V is an efficient framework for auto-regressive video generation with diffusion models. It generates a single frame at a time, conditioned on the previous ones.