MTVG: Multi-text Video Generation with Text-to-Video Models

The authors used the pre-trained diffusion-based text-to-video (T2V) generation model without additional fine-tuning.

BibTex: