-
Fine-tuned CLIP Models are Efficient Video Learners
This work explores the capability of a simple baseline called ViFi-CLIP (Video Fine-tuned CLIP) for adapting image-based CLIP to video domain. -
Mini-Kinetics
The Mini-Kinetics dataset is a mini version of the Kinetics-400 dataset, containing 240k training samples and 20k validation samples in 400 human action classes. -
HMDB51 and UCF101
The dataset used in the paper is HMDB51 and UCF101. -
Kinetics-400 and Something-Something-V2
The dataset used in the paper is Kinetics-400 and Something-Something-V2. -
Kinetics-400 and Kinetics-600
The Kinetics-400 and Kinetics-600 datasets are video understanding datasets used for learning rich and multi-scale spatiotemporal semantics from high-dimensional videos. -
Kinetics-600
The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories. -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming.... -
AVA-Kinetics
The AVA-Kinetics dataset is a video dataset of localized human actions. -
ActivityNet
Temporal activity detection has drawn increasing interests in both academic and industry communities due to its vast potential applications in security surveillance, behavior...