-
Learning to Predict Situation Hyper-Graphs for Video Question Answering
The SHG-VQA model predicts a situation hyper-graph structure composed of existing actions and relations in the input video. -
Action Search: Spotting Actions in Videos
This paper proposes a method for action search in videos, which is used for spotting actions in videos. -
SF-Net: Single-Frame Supervision for Temporal Action Localization
This paper proposes a method for temporal action localization using single-frame supervision. The authors use a unified system called SF-Net to predict actionness scores and... -
Breakfast dataset
The Breakfast dataset is another dataset used in the paper, which contains 712 videos of people performing various activities, such as making coffee or scrambling eggs. The... -
SynAction dataset
The dataset used for unsupervised image-to-image translation, consisting of a set of 20 possible actions performed by 10 different human renders. -
AViD Dataset: Anonymized Videos from Diverse Countries
AViD is a new public video dataset for action recognition, containing action videos from diverse countries. -
Generative Action-description Prompts for Skeleton-based Action Recognition
Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot... -
FaMoS dataset
The FaMoS dataset records 95 subjects, each performing 28 predefined actions (e.g. anger, disgust, fear, surprise). -
N-EPIC Kitchens
Recognizing and comprehending human actions and gestures is a crucial perception requirement for robots to interact with humans and carry out tasks in diverse domains, including... -
THUMOS'13 Dataset
The THUMOS'13 dataset contains 24 classes and 3,207 videos. -
JHMDB Dataset
The JHMDB dataset contains a varying number of actions (21 in JHMDB) across different domains (sports and daily activities). -
UCF Sports Dataset
The UCF Sports dataset contains a varying number of actions (10 in UCF Sports, 21 in JHMDB, and 24 in THUMOS’13) across different domains (sports and daily activities). -
Kinetics Skeleton 400
Kinetics Skeleton 400 is a dataset adapted from the Kinetics 400 video dataset using the OpenPose toolbox in 2D keypoint modality. -
THUMOS'14, ActivityNet v1.3
Temporal action detection in untrimmed videos via multi-stage cnns, Cdc: convolutional-de-convolutional networks for precise temporal action localization, Temporal action... -
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
The dataset used in the paper is a collection of videos and corresponding referring expressions. -
MPII Human Pose Dataset
Human pose estimation refers to the task of recognizing postures by localizing body keypoints (head, shoulders, elbows, wrists, knees, ankles, etc.) from images.