-
NTU-120 minus 60
A large-scale labeled RGB-Depth dataset for action recognition, containing 942 training and 132 validation videos for 51 action classes. -
Anti-UAV: A Large Multi-Modal Benchmark for UAV Tracking
A large multi-modal benchmark for UAV tracking, containing high-quality and high-definition video sequences of both RGB and IR, each annotated with bounding boxes, attributes,... -
MANUS-Grasps
MANUS-Grasps is a large real-world multi-view RGB grasp dataset with over 7M frames from 53 cameras, providing full 360-degree coverage of 400+ grasps in over 30 diverse... -
SEDS: Semantically Enhanced Dual-Stream Encoder for Sign Language Retrieval
Sign language retrieval is more biased towards understanding the semantic information of human actions contained in video clips. The proposed framework addresses these issues by... -
Centered-Shoe
A unified object-centric implicit representation that can be used for RGB and depth novel view rendering, 3D reconstruction, and proposing stable grasps. -
Spike dataset
The dataset used in the paper is a spike dataset generated from RGB frames of four open access outdoor datasets, including Kitti, Driving Stereo, Driving Stereo Weather, and... -
KITTI dataset
The dataset used in the paper is the KITTI dataset, which is a benchmark for monocular depth estimation. The dataset consists of a large collection of images and corresponding...