-
LocalStyleFool
LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything -
UCF101 and HMDB51 datasets
The UCF101 and HMDB51 datasets are used for video recognition. The UCF101 dataset contains 101 action categories, while the HMDB51 dataset contains 51 classes. -
Kinetics-400, Something-Something V2, Epic-Kitchens-100, HMDB51, and UCF101
The dataset used in the paper is a video recognition benchmark, specifically Kinetics-400, Something-Something V2, Epic-Kitchens-100, HMDB51, and UCF101. -
Mini-Kinetics
The Mini-Kinetics dataset is a mini version of the Kinetics-400 dataset, containing 240k training samples and 20k validation samples in 400 human action classes. -
Moments in Time
The Moments in Time dataset is a large-scale video action recognition dataset. -
MoViNets: Mobile Video Networks for Efficient Video Recognition
Mobile Video Networks (MoViNets) is a family of computation and memory efficient video networks that can operate on streaming video for online inference. -
Moving MNIST
Moving MNIST is a benchmark data set for video recognition. There are 10,000 samples including 8,000 for training and 2,000 for test. Each sample consists of 20 sequential gray... -
Something-Something V1
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Temporal-attentive Covariance Pooling Networks for Video Recognition
Video recognition aims to automatically analyze the contents of videos (e.g., events and actions), and has a wide range of applications, including intelligent surveillance,... -
Kinetics-600
The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories. -
Multi-Fiber Networks for Video Recognition
The proposed multi-fiber architecture is used for reducing the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while... -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming.... -
Something-Something V1 & V2
The Something-Something V1 & V2 dataset is a large-scale video dataset created by crowdsourcing. It contains about 100k videos over 174 categories, and the number of videos... -
Kinetics-700
Kinetics-700 is a large-scale video dataset for human action recognition, with 700 action categories.