26 datasets found

Filter Results
  • PortraitMode-400

    PortraitMode-400 is a dataset dedicated to portrait mode video recognition, with a fine-grained taxonomy of 400 categories.
  • VideoLT

    The VideoLT dataset contains 1,004 classes and about 256,218 untrimmed videos collected from YouTube, covering a wide range of human activities, including everyday life,...
  • LocalStyleFool

    LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything
  • UCF101 and HMDB51 datasets

    The UCF101 and HMDB51 datasets are used for video recognition. The UCF101 dataset contains 101 action categories, while the HMDB51 dataset contains 51 classes.
  • Kinetics-400, Something-Something V2, Epic-Kitchens-100, HMDB51, and UCF101

    The dataset used in the paper is a video recognition benchmark, specifically Kinetics-400, Something-Something V2, Epic-Kitchens-100, HMDB51, and UCF101.
  • Diving48

    The Diving48 dataset is a fine-grained video dataset of competitive diving. It has ∼18k trimmed video clips of 48 unambiguous dive sequences standardized by the professional....
  • Mini-Kinetics

    The Mini-Kinetics dataset is a mini version of the Kinetics-400 dataset, containing 240k training samples and 20k validation samples in 400 human action classes.
  • HowTo100M

    The dataset used in the LORD framework for autonomous driving, consisting of images, videos, and text-based observations.
  • Moments in Time

    The Moments in Time dataset is a large-scale video action recognition dataset.
  • MoViNets: Mobile Video Networks for Efficient Video Recognition

    Mobile Video Networks (MoViNets) is a family of computation and memory efficient video networks that can operate on streaming video for online inference.
  • Moving MNIST

    Moving MNIST is a benchmark data set for video recognition. There are 10,000 samples including 8,000 for training and 2,000 for test. Each sample consists of 20 sequential gray...
  • Jester

    The Jester dataset is of continuous jokes ratings from -10 to 10, containing the jokes’ texts.
  • Something-Something V1

    Video classification is a fundamental problem in many video-based tasks. Applications such as autonomous driving technology, controlling drones and robots are driving the demand...
  • Temporal-attentive Covariance Pooling Networks for Video Recognition

    Video recognition aims to automatically analyze the contents of videos (e.g., events and actions), and has a wide range of applications, including intelligent surveillance,...
  • Kinetics-600

    The Kinetics-600 dataset consists of 392k training videos and 30k validation videos in 600 human action categories.
  • Multi-Fiber Networks for Video Recognition

    The proposed multi-fiber architecture is used for reducing the computational cost of spatio-temporal deep neural networks, making them run as fast as their 2D counterparts while...
  • Kinetics-400

    Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....
  • Something-Something V1 & V2

    The Something-Something V1 & V2 dataset is a large-scale video dataset created by crowdsourcing. It contains about 100k videos over 174 categories, and the number of videos...
  • UCF101

    The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...
  • Kinetics-700

    Kinetics-700 is a large-scale video dataset for human action recognition, with 700 action categories.