UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content.

BibTex: