Video Classification - Groups

ImageNet and YouTube-8M

The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used datasets such as ImageNet and YouTube-8M.

Dataset
JSON

15 Scenes

The dataset used in this paper is a benchmark dataset for image and video classification. It contains 15 scenes with 4485 images, and 102 classes with 9144 images. The dataset...

Dataset
JSON

Structural Vision Transformer

Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and...

Dataset
JSON

3 datasets found

ImageNet and YouTube-8M

15 Scenes

Structural Vision Transformer