26 datasets found

Filter Results
  • AVA v2.2

    The AVA v2.2 dataset for spatiotemporal action localization contains the bounding box annotations and the corresponding action labels on keyframes.
  • Diving48

    The Diving48 dataset is a fine-grained video dataset of competitive diving. It has ∼18k trimmed video clips of 48 unambiguous dive sequences standardized by the professional....
  • Something-Something

    The Something-Something dataset consists of 174 fine-grained action categories that depict humans performing everyday actions with common objects. Recognizing actions in the...
  • Something-Something-V2

    The Something-Something-V2 dataset is a large-scale video action recognition dataset.
  • Something-Something v2 (SSv2)

    The Something-Something v2 (SSv2) dataset is a large collection of video clips of humans performing actions with everyday objects.
  • Fine-tuned CLIP Models are Efficient Video Learners

    This work explores the capability of a simple baseline called ViFi-CLIP (Video Fine-tuned CLIP) for adapting image-based CLIP to video domain.
  • Mini-Kinetics

    The Mini-Kinetics dataset is a mini version of the Kinetics-400 dataset, containing 240k training samples and 20k validation samples in 400 human action classes.
  • HMDB51 dataset

    The HMDB51 dataset is a video dataset for human action recognition. It contains 6,767 videos annotated with 51 categories of human actions.
  • Kinetics-700 dataset

    The Kinetics-700 dataset is a large-scale video dataset for human action recognition. It contains 555,774 videos annotated with 700 categories of human actions.
  • Kinetics400

    Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand...
  • IG65M

    The dataset used in the paper for self-supervised learning of video representations.
  • EGTEA Gaze+

    The EGTEA Gaze+ dataset offers approximately 10,000 samples of 106 non-scripted daily activities that occur in a kitchen.
  • HMDB51 and UCF101

    The dataset used in the paper is HMDB51 and UCF101.
  • Kinetics-400 and Something-Something-V2

    The dataset used in the paper is Kinetics-400 and Something-Something-V2.
  • Kinetics dataset

    The Kinetics dataset is a large-scale action recognition dataset. It contains videos of various actions performed by humans, with annotations of the actions performed.
  • UCF-101 dataset

    UCF-101 dataset is a large-scale action recognition dataset, containing 13,320 videos categorized into 101 human action categories.
  • Kinetics-400 and Kinetics-600

    The Kinetics-400 and Kinetics-600 datasets are video understanding datasets used for learning rich and multi-scale spatiotemporal semantics from high-dimensional videos.
  • Kinetics-600

    The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories.
  • HMDB-51

    Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....
  • Kinetics-400

    Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....