Video Understanding - Groups - LDM

TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-...

TOPA is a text-only pre-alignment framework for extending large language models for video understanding without the need for pre-training on real video data.
- Dataset
- JSON
UCF101

The UCF101 dataset contains 13320 videos distributed in 101 action categories. This dataset is different from the above ones in that it contains mostly coarse sports activities...
- Dataset
- JSON
Kinetics-400, Something-Something-V2, and Epic-Kitchens-100

The authors used the Kinetics-400, Something-Something-V2, and Epic-Kitchens-100 datasets for video understanding tasks.
- Dataset
- JSON

«
1
2
3

Before browse our site, please accept our cookies policy