-
Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sam...
Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA). The major stumbling block to achieve this purpose is... -
Slot-VLM: SlowFast Slots for Video-Language Modeling
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the... -
Youtube2Text-QA
Video question answering task, which requires machines to answer questions about videos in a natural language form. -
Zero-shot video question answering via frozen bidirectional language models
Zero-shot video question answering via frozen bidirectional language models.