-
VideoStreaming
A novel approach to tackle the complexities of long video understanding with large language models (LLMs). Our proposed memory-propagated streaming encoding architecture... -
Kinetics-400, UCF101, HMDB51, Something-Something V1, and Something-Something V2
The Kinetics-400, UCF101, HMDB51, Something-Something V1, and Something-Something V2 datasets are used for evaluating the performance of the Bi-Calibration Networks. -
Kinetics-400
Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming.... -
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-...
TOPA is a text-only pre-alignment framework for extending large language models for video understanding without the need for pre-training on real video data. -
Kinetics-400, Something-Something-V2, and Epic-Kitchens-100
The authors used the Kinetics-400, Something-Something-V2, and Epic-Kitchens-100 datasets for video understanding tasks.