-
HMDB51, UCF101, Kinetics-400, and IG65M
Four datasets are relevant for our study: HMDB51, UCF101, Kinetics-400, and IG65M. -
DAVIS and YouTube-VOS datasets
The DAVIS dataset comprises 60, 30 and 30 video sequences for training, validation and testing, respectively. The YouTube-VOS dataset is a larger dataset with 3471 videos for... -
Every Moment Counts
The Every Moment Counts dataset contains 1,000 hours of video footage of various activities. -
Hollywood in Homes
The Hollywood in Homes dataset contains 9,848 videos of daily activities across 157 classes. -
THUMOS Challenge
The THUMOS Challenge dataset contains 413 sports videos of 65 classes. -
Dual DETRs for Multi-Label Temporal Action Detection
Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. -
Multi Visual Modality Fall Detection Dataset (MUVIM)
The Multi Visual Modality Fall Detection Dataset (MUVIM) was used for anomaly detection of falls. It contains (6) vision-based sensors of different modalities including thermal,... -
UCF-101 dataset
UCF-101 dataset is a large-scale action recognition dataset, containing 13,320 videos categorized into 101 human action categories. -
MultiTHUMOS
Temporal action localization (TAL) is a prevailing task due to its great application potential. Existing works in this field mainly suffer from two weaknesses: (1) They often... -
TemporalMaxer: Maximize Temporal Context with only Max Pooling
Temporal action localization (TAL) is a challenging task in video understanding that aims to identify and localize actions within a video sequence. -
Something-Something V1
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Mini-Kinetics-200
Mini-Kinetics-200: A dataset of 200 human action classes from videos in the wild. -
Temporal-attentive Covariance Pooling Networks for Video Recognition
Video recognition aims to automatically analyze the contents of videos (e.g., events and actions), and has a wide range of applications, including intelligent surveillance,... -
Kinetics-600
The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories. -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....