Dataset - LDM

Hollywood Extended dataset

The Hollywood Extended dataset contains 937 videos, 16 action classes and shorter videos, averaging 30 seconds.
- Dataset
- JSON
Breakfast dataset

The Breakfast dataset is another dataset used in the paper, which contains 712 videos of people performing various activities, such as making coffee or scrambling eggs. The...
- Dataset
- JSON
Multi-Mice PartsTrack dataset

The Multi-Mice PartsTrack dataset is a challenging dataset for multi-mice part tracking in videos. It contains 10 videos of two or three mice interacting freely in a home cage...
- Dataset
- JSON
My View is the Best View: Procedure Learning from Egocentric Videos

A dataset for procedure learning from egocentric videos.
- Dataset
- JSON
UCF Sports

UCF Sports dataset consists of 150 videos from sport broadcasts covering 10 action categories.
- Dataset
- JSON
HMDB

The HMDB dataset consists of 6766 videos from 51 different action categories. The videos are generally of low quality, with strong camera motion, and non-centered people.
- Dataset
- JSON
YTF

Face recognition and person re-identiﬁcation using paired image-attribute data, where the attributes (i.e., soft biometrics) are only available during the training phase.
- Dataset
- JSON
UCF

UCF is a dataset for face forgery detection, focusing on uncovering common features shared by different manipulation techniques.
- Dataset
- JSON
MovieQA, TVQA, AVSD, EQA, Embodied QA

A collection of datasets for visual question answering, including MovieQA, TVQA, AVSD, EQA, and Embodied QA.
- Dataset
- JSON
ADTH-QA

A dataset of audio-driven talking head videos generated by four generative methods
- Dataset
- JSON
SoccerNet

SoccerNet is a dataset for action spotting in soccer videos, containing a large number of annotated videos.
- Dataset
- JSON
SoccerNet-v2

SoccerNet-v2 is a large-scale dataset for action spotting in soccer videos, containing over 110K action labels.
- Dataset
- JSON
MPI-INF-3DHP dataset

The MPI-INF-3DHP dataset is a large-scale dataset for 3D human pose estimation in videos. It consists of 8 subjects performing 8 activities.
- Dataset
- JSON
Human3.6M dataset

The Human3.6M dataset is a large-scale dataset for 3D human pose estimation in videos. It consists of 3.6 million frames captured by four 50 Hz cameras.
- Dataset
- JSON
UCF-101 dataset for human action recognition

UCF-101 is a large-scale dataset of human actions in videos.
- Dataset
- JSON
IG65M

The dataset used in the paper for self-supervised learning of video representations.
- Dataset
- JSON
Kinetics dataset

The Kinetics dataset is a large-scale action recognition dataset. It contains videos of various actions performed by humans, with annotations of the actions performed.
- Dataset
- JSON
CVBL video database

CVBL video database for face recognition in videos
- Dataset
- JSON
IJB-C

The dataset used in the FairFace Challenge at ECCV 2020 is a reannotated version of IJB-C [37] database enriched by newly collected 12,549 public domain images.
- Dataset
- JSON
Tunnel Try-on

The Tunnel Try-on dataset is a collection of videos with product garment images.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

29 datasets found