Video Classification - Groups

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the...

Dataset
JSON

Kinetics and Something-Something V2 datasets

The dataset used in the paper for few-shot video classification, containing videos from Kinetics and Something-Something V2 datasets.

Dataset
JSON

YouTube-8M: A Large-Scale Video Classification Benchmark

YouTube-8M is a large-scale video classification benchmark.

Dataset
JSON

VideoLT

The VideoLT dataset contains 1,004 classes and about 256,218 untrimmed videos collected from YouTube, covering a wide range of human activities, including everyday life,...

Dataset
JSON

ImageNet and YouTube-8M

The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used datasets such as ImageNet and YouTube-8M.

Dataset
JSON

15 Scenes

The dataset used in this paper is a benchmark dataset for image and video classification. It contains 15 scenes with 4485 images, and 102 classes with 9144 images. The dataset...

Dataset
JSON

Condensed Movies

The dataset used for text-to-video retrieval and video classification tasks.

Dataset
JSON

Kinetics400

Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand...

Dataset
JSON

Kinetics dataset

The Kinetics dataset is a large-scale action recognition dataset. It contains videos of various actions performed by humans, with annotations of the actions performed.

Dataset
JSON

Something-Something V1

Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand...

Dataset
JSON

Kinetics-600

The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories.

Dataset
JSON

FineGym

FineGym is a hierarchical video dataset for fine-grained action understanding, containing 354 action categories.

Dataset
JSON

Resound

Resound is a video dataset for action recognition without representation bias.

Dataset
JSON

Structural Vision Transformer

Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and...

Dataset
JSON

Kinetics-400

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....

Dataset
JSON

Something-Something V1 & V2

The Something-Something V1 & V2 dataset is a large-scale video dataset created by crowdsourcing. It contains about 100k videos over 174 categories, and the number of videos...

Dataset
JSON

ActivityNet Captions

The ActivityNet Captions is a benchmark dataset proposed for dense video captioning. There are 20K untrimmed videos in total, and each video has several annotated segments with...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

UCF101

The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...

Dataset
JSON

HMDB51

Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand...

Dataset
JSON

22 datasets found