-
Dual-Motion Transfer GAN
Generating videos with content and motion variations is a challenging task in computer vision. The proposed model is trained in an end-to-end manner, without the need to utilize... -
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion ...
Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. -
Mora: Enabling Generalist Video Generation via a Multi-Agent Framework
A video dataset for training a generalist video generation model. -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Customized generation using diffusion models has made impressive progress in image generation, but remains un-satisfactory in the challenging video generation task, as it... -
S2DM: Sector-Shaped Diffusion Models for Video Generation
Diffusion models have achieved great success in image generation. However, when leveraging this idea for video generation, we face significant challenges in maintaining the... -
MSR-VTT and UCF-101
The dataset used in the paper is MSR-VTT and UCF-101, two public datasets for video-text generation. MSR-VTT contains 4,900 videos with 20 manually annotated captions for each... -
MultiStudioBench
The MultiStudioBench dataset contains 25 subjects, including objects, animals, etc., and there are few images for each subject. Images in the dataset are from previous works or... -
Video In-Context Learning
Video In-Context Learning (Vid-ICL) is a novel framework that extends in-context learning to video data. -
Airplanes Dataset
The dataset used for video generation and evaluation of the proposed iVGAN model. -
Stabilized Videos
The dataset used for video generation and evaluation of the proposed iVGAN model. -
Open-Sora Plan
The dataset used in this paper for text-to-video generation, consisting of short video clips. -
VideoCrafter1
The dataset used in this paper for text-to-video generation, consisting of short video clips. -
VideoCrafter2
The dataset used in this paper for text-to-video generation, consisting of short video clips. -
MUG Facial Expression
A dataset of MUG facial expression, consisting of videos of 52 actors performing 6 different facial expressions.