-
SEWA: A Large-Scale Video Dataset for Affective Computing
The SEWA dataset contains video clips annotated with facial landmarks, valence, and arousal. -
AFEW-VA: A Database for Valence and Arousal Estimation in-the-Wild
The AFEW-VA dataset contains video clips annotated with valence and arousal. -
Charades dataset
The Charades dataset is a dataset for human action recognition. It contains 200 videos with 3,800+ action instances. -
Kinship Video (KIVI) Face Database
A new kinship video (KIVI) face database of 503 individuals with wild variations due to pose, illumination, occlusion, ethnicity, and expression. -
Kinship verification from videos using spatio-temporal texture features and d...
A video-based kinship verification framework using spatio-temporal texture features and deep learning. -
Like father, like son: Facial expression dynamics for kinship verification
A video-based kinship verification framework using facial expression dynamics. -
Supervised Mixed Norm Autoencoder for Kinship Verification in Unconstrained V...
A new deep learning framework for kinship verification in unconstrained videos using a novel Supervised Mixed Norm regularization Autoencoder (SMNAE). -
ActivityNet v1.3
Temporal action proposal generation is an important task, akin to object proposals, temporal action proposals are intended to capture “clips” or temporal intervals in videos... -
UCF-24 and JHMDB-21
UCF-24 and JHMDB-21 are two public action datasets used for evaluation of action detection algorithms. -
SoccerNet-v2
SoccerNet-v2 is a large-scale dataset for action spotting in soccer videos, containing over 110K action labels. -
UVG dataset
The dataset used in the paper is not explicitly described, but it is mentioned that the authors tested their model on various signal reconstruction tasks: 1D sinusoidal... -
ZJU-MoCap Dataset
The ZJU-MoCap dataset is a prominent benchmark in human modeling from videos. The dataset includes 6 human subjects, each with 100 frames collected from one camera for training... -
ActivityNet v1.2
Weakly-Supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos with only video-level labels.