Dataset - LDM

InterVid-14M-aesthetics

The dataset used in the paper is InterVid-14M-aesthetics, which is a subset of InterVid-14M used to remove watermarks from generated videos.
- Dataset
- JSON
Dual-Motion Transfer GAN

Generating videos with content and motion variations is a challenging task in computer vision. The proposed model is trained in an end-to-end manner, without the need to utilize...
- Dataset
- JSON
UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion ...

Video Diffusion Models have been developed for video generation, usually integrating text and image conditioning to enhance control over the generated content.
- Dataset
- JSON
Mora: Enabling Generalist Video Generation via a Multi-Agent Framework

A video dataset for training a generalist video generation model.
- Dataset
- JSON
RefDrop

The dataset used in the paper for consistent image generation and video generation.
- Dataset
- JSON
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

Customized generation using diffusion models has made impressive progress in image generation, but remains un-satisfactory in the challenging video generation task, as it...
- Dataset
- JSON
Lumiere

A dataset of 30M videos along with their text captions.
- Dataset
- JSON
S2DM: Sector-Shaped Diffusion Models for Video Generation

Diffusion models have achieved great success in image generation. However, when leveraging this idea for video generation, we face significant challenges in maintaining the...
- Dataset
- JSON
MSR-VTT and UCF-101

The dataset used in the paper is MSR-VTT and UCF-101, two public datasets for video-text generation. MSR-VTT contains 4,900 videos with 20 manually annotated captions for each...
- Dataset
- JSON
MultiStudioBench

The MultiStudioBench dataset contains 25 subjects, including objects, animals, etc., and there are few images for each subject. Images in the dataset are from previous works or...
- Dataset
- JSON
Fashion

The Fashion dataset is used for human image animation task. It contains videos of humans performing different actions.
- Dataset
- JSON
Webvid

Webvid is a large-scale video dataset that contains internet videos.
- Dataset
- JSON
Video In-Context Learning

Video In-Context Learning (Vid-ICL) is a novel framework that extends in-context learning to video data.
- Dataset
- JSON
CogVideo

CogVideo is a large-scale pretrained transformer for text-to-video generation. It is trained on a dataset of 5.4 million captioned videos with a spatial resolution of 160×160.
- Dataset
- JSON
Airplanes Dataset

The dataset used for video generation and evaluation of the proposed iVGAN model.
- Dataset
- JSON
Stabilized Videos

The dataset used for video generation and evaluation of the proposed iVGAN model.
- Dataset
- JSON
Vidu

Vidu is a high-definition text-to-video generator that demonstrates strong abilities in various aspects, including duration, coherence, and dynamism of the generated videos, on...
- Dataset
- JSON
Open-Sora Plan

The dataset used in this paper for text-to-video generation, consisting of short video clips.
- Dataset
- JSON
VideoCrafter1

The dataset used in this paper for text-to-video generation, consisting of short video clips.
- Dataset
- JSON
VideoCrafter2

The dataset used in this paper for text-to-video generation, consisting of short video clips.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

33 datasets found