Video Understanding - Groups

InterVid-14M-aesthetics

The dataset used in the paper is InterVid-14M-aesthetics, which is a subset of InterVid-14M used to remove watermarks from generated videos.

Dataset
JSON

Kinetics-400 and Kinetics-600

The Kinetics-400 and Kinetics-600 datasets are video understanding datasets used for learning rich and multi-scale spatiotemporal semantics from high-dimensional videos.

Dataset
JSON

MSR-VTT

The dataset used in the paper is MSR-VTT, a large video description dataset for bridging video and language. The dataset contains 10k video clips with length varying from 10 to...

Dataset
JSON

UCF101

The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...

Dataset
JSON

4 datasets found

InterVid-14M-aesthetics

Kinetics-400 and Kinetics-600

MSR-VTT

UCF101