-
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-...
TOPA is a text-only pre-alignment framework for extending large language models for video understanding without the need for pre-training on real video data. -
Kinetics-400, Something-Something-V2, and Epic-Kitchens-100
The authors used the Kinetics-400, Something-Something-V2, and Epic-Kitchens-100 datasets for video understanding tasks.