-
OTB2013, OTB2015, VOT2015, VOT2016
Visual object tracking, which tracks a specified target in a changing video sequence automatically, is a fundamental problem in many topics such as visual analysis, automatic... -
WorldExpo'10
A crowd counting dataset with over 1000 labeled videos captured by over 100 monitoring cameras. -
Crowd Video Captioning Dataset
A crowd video captioning dataset based on the WorldExpo'10 dataset, with 98 videos selected and captions generated for them. -
Vcdb: A Large-Scale Database for Partial Copy Detection in Videos
A large-scale database for partial copy detection in videos. -
Breast Ultrasound Video Diagnosis
The dataset is used for breast cancer diagnosis in ultrasound videos. -
Tinyvideos dataset
Tinyvideos dataset -
VisDrone2019 Dataset
The VisDrone2019 dataset contains 288 video clips made up of 261,908 frames and 10,209 images -
Day-to-Day Video Dataset
A dataset of 30 videos of length 3 minutes to 20 minutes from five classes of daily activities: socializing, home repair, biking around urban areas, cooking, and home tours. -
ActivityNet1.2
The ActivityNet1.2 dataset is a large-scale benchmark for action recognition and localization in videos. -
EPIC-KITCHENS
EPIC-KITCHENS is a large-scale egocentric video benchmark recorded by 32 participants in their native kitchen environments. Our videos depict non-scripted daily activities: we... -
Temporal Sentence Grounding in Videos
Temporal sentence grounding in videos (TSGV) is a task to retrieve a video segment that semantically corresponds to a query in natural language. -
WildDeepFakes
A challenging real-world dataset for deepfake detection. -
Temporal Deepfake Segment Benchmark
A deepfake detection method that can address the issue of modifying segments of videos using generative techniques. -
Agreement ADOS database, Kaggle database, and self-gathered video test dataset
The AGRE ADOS database, Kaggle database, and a self-gathered video test dataset with corresponding ADOS data -
InternVid: A Large-Scale Video-Text Dataset for Multimodal Understanding and ...
InternVid: A large-scale video-text dataset for multimodal understanding and generation.