3 datasets found

Groups: Image Classification Formats: JSON

Filter Results
  • ImageNet and YouTube-8M

    The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used datasets such as ImageNet and YouTube-8M.
  • 15 Scenes

    The dataset used in this paper is a benchmark dataset for image and video classification. It contains 15 scenes with 4485 images, and 102 classes with 9144 images. The dataset...
  • Structural Vision Transformer

    Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and...