Dataset - LDM

EPIC-KITCHENS

EPIC-KITCHENS is a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict non-scripted daily activities: we...
- Dataset
- JSON
Temporal Sentence Grounding in Videos

Temporal sentence grounding in videos (TSGV) is a task to retrieve a video segment that semantically corresponds to a query in natural language.
- Dataset
- JSON
WildDeepFakes

A challenging real-world dataset for deepfake detection.
- Dataset
- JSON
Temporal Deepfake Segment Benchmark

A deepfake detection method that can address the issue of modifying segments of videos using generative techniques.
- Dataset
- JSON
Agreement ADOS database, Kaggle database, and self-gathered video test dataset

The AGRE ADOS database, Kaggle database, and a self-gathered video test dataset with corresponding ADOS data
- Dataset
- JSON
HMDB

The HMDB dataset consists of 6766 videos from 51 different action categories. The videos are generally of low quality, with strong camera motion, and non-centered people.
- Dataset
- JSON
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...

InternVid: A large-scale video-text dataset for multimodal understanding and generation.
- Dataset
- JSON
Cholec80

The proposed network is implemented in PyTorch using a single Tesla V100-DGXS-32GB GPU of an NVIDIA DGX station. For the ResNet-50 part, PyTorch default ImageNet pretrained...
- Dataset
- JSON
YTF

Face recognition and person re-identiﬁcation using paired image-attribute data, where the attributes (i.e., soft biometrics) are only available during the training phase.
- Dataset
- JSON
THUMOS Challenge: Action Recognition with a Large Number of Classes

THUMOS Challenge: Action Recognition with a Large Number of Classes.
- Dataset
- JSON
ActivityNet-1.3

Generating human action proposals in untrimmed videos is an important yet challenging task with wide applications. Current methods often suffer from the noisy boundary locations...
- Dataset
- JSON
VGGSound

The VGGSound dataset is a large-scale audio-visual dataset containing 10,000 10-second video clips with corresponding audio files.
- Dataset
- JSON
SDU Fall Dataset

The dataset contains videos of people performing normal activities and falls.
- Dataset
- JSON
UR Dataset

The dataset contains videos of people performing normal activities and falls.
- Dataset
- JSON
Spatio-Temporal Adversarial Learning for Detecting Unseen Falls

The proposed spatio-temporal adversarial learning framework for detecting unseen falls from videos.
- Dataset
- JSON
SEWA: A Large-Scale Video Dataset for Affective Computing

The SEWA dataset contains video clips annotated with facial landmarks, valence, and arousal.
- Dataset
- JSON
AFEW-VA: A Database for Valence and Arousal Estimation in-the-Wild

The AFEW-VA dataset contains video clips annotated with valence and arousal.
- Dataset
- JSON
ActivityNet v1.3

Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture “clips” or temporal intervals in videos...
- Dataset
- JSON
MuscleMap

The MuscleMap dataset is a large-scale video-based dataset for Activated Muscle Group Estimation (AMGE) in the wild task. It contains 15,004 video clips with 135 different...
- Dataset
- JSON
MoVi

MoVi is a large multi-purpose human motion and video dataset.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

86 datasets found