-
InterVid-14M-aesthetics
The dataset used in the paper is InterVid-14M-aesthetics, which is a subset of InterVid-14M used to remove watermarks from generated videos. -
Video Generation from Text Employing Latent Path Construction for Temporal Mo...
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study. In this paper, we tackle the text to video generation problem,... -
Video Generative Patch Nearest Neighbors (VGPNN)
A non-parametric approach for video generation from a single video, outperforming single-video GANs in visual quality and realism. -
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion ...
Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. -
Mora: Enabling Generalist Video Generation via a Multi-Agent Framework
A video dataset for training a generalist video generation model. -
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Customized generation using diffusion models has made impressive progress in image generation, but remains un-satisfactory in the challenging video generation task, as it... -
S2DM: Sector-Shaped Diffusion Models for Video Generation
Diffusion models have achieved great success in image generation. However, when leveraging this idea for video generation, we face significant challenges in maintaining the... -
StyleVideoGAN
StyleVideoGAN: A Temporal Generative Model using a Pretrained StyleGAN -
MOFA-Video
MOFA-Video is a controllable image animation method that generates video from the given image using various additional controllable signals. -
MSR-VTT and UCF-101
The dataset used in the paper is MSR-VTT and UCF-101, two public datasets for video-text generation. MSR-VTT contains 4,900 videos with 20 manually annotated captions for each... -
SoloDance Dataset
The SoloDance dataset contains 179 solo dance videos in real scenes collected online. -
iPER Dataset
The iPER dataset was proposed by [25], which was collected in the laboratory environment. -
REMOT: A Region-to-Whole Framework for Realistic Human Motion Transfer
Human Video Motion Transfer (HVMT) aims to, given an image of a source person, generate his/her video that imitates the motion of the driving person. -
Events-to-Video: Bringing Modern Computer Vision to Event Cameras
E2VID is an event-to-video pipeline that converts event data into a video sequence. -
Sky Time-lapse
Sky Time-lapse for video generation