-
Kinetics400
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Something-Something V1
Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand... -
Structural Vision Transformer
Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and... -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming.... -
Something-Something V1 & V2
The Something-Something V1 & V2 dataset is a large-scale video dataset created by crowdsourcing. It contains about 100k videos over 174 categories, and the number of videos... -
Youtube-8M
Youtube-8M is a large-scale video classification benchmark.