Video Understanding - Groups

YouCook2

YouCook2 consists of recipes containing labels that separate the long horizon trajectories of demonstrations into events - with explicit time stamps for the beginning and end of...

Dataset
JSON

TimeIT: A Video-Centric Instruction-Tuning Dataset

TimeIT is a video-centric instruction-tuning dataset designed for instruction tuning. It is composed of 6 diverse tasks, 12 widely-used academic benchmarks, and a total of 125K...

Dataset
JSON

TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Und...

TimeChat is a time-sensitive multimodal large language model specifically designed for long video understanding. It incorporates two key architectural contributions: a...

Dataset
JSON

WildQA

A video understanding dataset of videos recorded in outside settings, including video question answering and video evidence selection.

Dataset
JSON

MVBench

A comprehensive multi-modal video understanding benchmark.

Dataset
JSON

InterVid-14M-aesthetics

The dataset used in the paper is InterVid-14M-aesthetics, which is a subset of InterVid-14M used to remove watermarks from generated videos.

Dataset
JSON

VideoVista

VideoVista is a comprehensive video evaluation benchmark for Video-LLMs that covers both video understanding and reasoning across 27 tasks.

Dataset
JSON

Charades-STA dataset

Temporal grounding of activities, the identification of specific time intervals of actions within a larger event context, is a critical task in video understanding.

Dataset
JSON

TVQA

TVQA is a video question answering dataset collected from 6 long-running TV shows from 3 genres. There are 21,793 video clips in total for QA collection, accompanied with...

Dataset
JSON

Ask-Anything

A video-centric multimodal instruction dataset, composed of thousands of videos associated with detailed descriptions and conversations.

Dataset
JSON

VidOR

The VidOR dataset is a rich video dataset containing natural videos of daily life.

Dataset
JSON

VQ2D

The VQ2D dataset is a subset of the Ego4D dataset, containing ground truth tracking annotations for the query object's last appearance.

Dataset
JSON

EgoLoc

The EgoLoc dataset is a reformulation of the VQ3D task and a modular pipeline that leads to significant improvements on the Ego4D VQ3D benchmark.

Dataset
JSON

PLOT-TAL - Prompt Learning with Optimal Transport for Few-Shot Temporal Actio...

Temporal Action Localization (TAL) in few-shot learning. Our work addresses the inherent limitations of conventional single-prompt learning methods that often lead to...

Dataset
JSON

QVHighlights

QVHighlights is a dataset for video highlight detection, which consists of over 10,000 videos annotated with human-written text queries.

Dataset
JSON

Long Video Understanding Benchmark

Towards long-form video understanding. We propose a two-stream spatio-temporal attention network for long video classification which combines the advantages of convolutional...

Dataset
JSON

MMX-Trailer-20 Dataset

Long form video understanding (LVU) is a sub-domain of video recognition concerned with understanding contextual information across contiguous shots which can contain multiple...

Dataset
JSON

Open Vocabulary Multi-Label Video Classification

Open vocabulary multi-label video classification dataset

Dataset
JSON

MovieChat

MovieChat: From dense token to sparse memory for long video understanding.

Dataset
JSON

Video-Chat2

Video-Chat2: From dense token to sparse memory for long video understanding.

Dataset
JSON

50 datasets found