-
ActivityNet-QA
Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works... -
HealthVidQA-Prompt
The HealthVidQA-Prompt dataset is a large-scale medical instructional video question-answering dataset. It contains 52,771 video-question-answer triplets from 13,990 medical... -
HealthVidQA-CRF
The HealthVidQA-CRF dataset is a large-scale medical instructional video question-answering dataset. It contains 23,434 video-question-answer triplets from 11,708 medical videos. -
KnowIT VQA
A video story question answering dataset containing 24,282 questions about 207 episodes of The Big Bang Theory. -
Progressive Graph Attention Network for Video Question Answering
Progressive Graph Attention Network for Video Question Answering. -
Slot-VLM: SlowFast Slots for Video-Language Modeling
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the... -
Youtube2Text-QA
Video question answering task, which requires machines to answer questions about videos in a natural language form. -
Zero-shot video question answering via frozen bidirectional language models
Zero-shot video question answering via frozen bidirectional language models.