-
Kinetics400
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Video-MNIST
Video-MNIST is a novel variant of the classic MNIST dataset. It contains 70000 sequences, each sequence containing 30 frames showing an affine transformation on a single... -
Something-Something V1
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Structural Vision Transformer
Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and... -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming.... -
Something-Something V1 & V2
The Something-Something V1 & V2 dataset is a large-scale video dataset created by crowdsourcing. It contains about 100k videos over 174 categories, and the number of videos... -
Kinetics-700
Kinetics-700 is a large-scale video dataset for human action recognition, with 700 action categories. -
Youtube-8M
Youtube-8M is a large-scale video classification benchmark.