-
VideoMap: Supporting Video Editing Exploration, Brainstorming, and Prototypin...
VideoMap is a proof-of-concept video editing interface that operates on video frames projected onto a latent space, enabling users to visually uncover patterns and relationships. -
InstructVid2Vid dataset
The dataset used for training the InstructVid2Vid model, which consists of video-instruction-edited video triplets. -
MotionFollower
The dataset used in the paper is not explicitly described, but it is mentioned that the authors collect 3K videos (60-90 seconds long) from the internet to train their model. -
MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers
Text-based video editing using MaskINT, a two-stage pipeline involving keyframe joint editing and structure-aware frame interpolation. -
Emu Video Edit Training Dataset
The Emu Video Edit model's training dataset, containing 1600 videos with 7 editing instructions each. -
Emu Video Edit Dataset
The dataset used for training the Emu Video Edit model, containing 1600 videos. -
EVE: Efficient zero-shot text-based Video Editing
Zero-shot text-based video editing with depth map guidance and temporal consistency constraints -
Make-a-protagonist: Generic video editing with an ensemble of experts
Make-a-protagonist: Generic video editing with an ensemble of experts. -
Zero-shot video editing using off-the-shelf image diffusion models
Zero-shot video editing using off-the-shelf image diffusion models. -
ControlVideo
ControlVideo is a general framework to utilize T2I diffusion models for one-shot video editing, which incorporates additional conditions such as edge maps, the key frame and... -
Tune-A-Video
The dataset used in the paper for video editing tasks -
EI2 model for text-driven video editing
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the DAVIS dataset and the Pexels website to gather face videos. -
Davis and WebVid datasets
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used 26 text-video pairs from the public DAVIS and WebVid datasets.