Video Understanding - Groups

VQ2D

The VQ2D dataset is a subset of the Ego4D dataset, containing ground truth tracking annotations for the query object's last appearance.

Dataset
JSON

EgoLoc

The EgoLoc dataset is a reformulation of the VQ3D task and a modular pipeline that leads to significant improvements on the Ego4D VQ3D benchmark.

Dataset
JSON

MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

UCF101

The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...

Dataset
JSON

5 datasets found

VQ2D

EgoLoc

MSVD

MSR-VTT

UCF101