-
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion ...
Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content. -
MSR-VTT and UCF-101
The dataset used in the paper is MSR-VTT and UCF-101, two public datasets for video-text generation. MSR-VTT contains 4,900 videos with 20 manually annotated captions for each... -
ModelScope text-to-video
The dataset used in the paper for text-to-video diffusion models -
MTVG: Multi-text Video Generation with Text-to-Video Models
The authors used the pre-trained diffusion-based text-to-video (T2V) generation model without additional fine-tuning.