Dataset - LDM

Learning to Predict Situation Hyper-Graphs for Video Question Answering

The SHG-VQA model predicts a situation hyper-graph structure composed of existing actions and relations in the input video.
- Dataset
- JSON
Action Search: Spotting Actions in Videos

This paper proposes a method for action search in videos, which is used for spotting actions in videos.
- Dataset
- JSON
SF-Net: Single-Frame Supervision for Temporal Action Localization

This paper proposes a method for temporal action localization using single-frame supervision. The authors use a unified system called SF-Net to predict actionness scores and...
- Dataset
- JSON
Breakfast dataset

The Breakfast dataset is another dataset used in the paper, which contains 712 videos of people performing various activities, such as making coffee or scrambling eggs. The...
- Dataset
- JSON
SynAction dataset

The dataset used for unsupervised image-to-image translation, consisting of a set of 20 possible actions performed by 10 different human renders.
- Dataset
- JSON
PKU-MMD

A large-scale labeled RGB-Depth-Optical Flow dataset for action recognition, containing 1,074 long untrimmed videos with paired RGB, depth, and optical flow modalities for 51...
- Dataset
- JSON
AViD Dataset: Anonymized Videos from Diverse Countries

AViD is a new public video dataset for action recognition, containing action videos from diverse countries.
- Dataset
- JSON
Generative Action-description Prompts for Skeleton-based Action Recognition

Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot...
- Dataset
- JSON
CATER

CATER (a dataset for Compositional Actions and TEmporal Reasoning) was released by [45] under the Apache 2.0 license.
- Dataset
- JSON
FaMoS dataset

The FaMoS dataset records 95 subjects, each performing 28 predefined actions (e.g. anger, disgust, fear, surprise).
- Dataset
- JSON
VOCA-2012

VOCA-2012 is an action recognition dataset with task-dependent eye-fixation data.
- Dataset
- JSON
N-EPIC Kitchens

Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including...
- Dataset
- JSON
THUMOS'13 Dataset

The THUMOS'13 dataset contains 24 classes and 3,207 videos.
- Dataset
- JSON
JHMDB Dataset

The JHMDB dataset contains a varying number of actions (21 in JHMDB) across different domains (sports and daily activities).
- Dataset
- JSON
UCF Sports Dataset

The UCF Sports dataset contains a varying number of actions (10 in UCF Sports, 21 in JHMDB, and 24 in THUMOS’13) across different domains (sports and daily activities).
- Dataset
- JSON
Kinetics Skeleton 400

Kinetics Skeleton 400 is a dataset adapted from the Kinetics 400 video dataset using the OpenPose toolbox in 2D keypoint modality.
- Dataset
- JSON
THUMOS'14, ActivityNet v1.3

Temporal action detection in untrimmed videos via multi-stage cnns, Cdc: convolutional-de-convolutional networks for precise temporal action localization, Temporal action...
- Dataset
- JSON
N-UCLA

N-UCLA dataset is a widely used skeleton-based action recognition dataset, containing 1494 video clips featuring 10 volunteers.
- Dataset
- JSON
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

The dataset used in the paper is a collection of videos and corresponding referring expressions.
- Dataset
- JSON
MPII Human Pose Dataset

Human pose estimation refers to the task of recognizing postures by localizing body keypoints (head, shoulders, elbows, wrists, knees, ankles, etc.) from images.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

85 datasets found