-
Causal-VidQA
This dataset is used in the paper to evaluate the performance of the TranSTR architecture. -
ActivityNet-QA
Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works... -
Star: A Benchmark for Situated Reasoning in Real-World Videos
The STAR dataset provides 60K situated reasoning questions based on 22K trimmed situation video clips. -
Agqa: A Benchmark for Compositional Spatio-Temporal Reasoning
The AGQA benchmark is a visual dataset comprising 192M hand-crafted questions about 9.6K videos from the Charades dataset. -
Learning to Predict Situation Hyper-Graphs for Video Question Answering
The SHG-VQA model predicts a situation hyper-graph structure composed of existing actions and relations in the input video. -
KnowIT VQA
A video story question answering dataset containing 24,282 questions about 207 episodes of The Big Bang Theory. -
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization i...
This paper proposes a video question answering model that effectively integrates multi-modal input sources and finds the temporally relevant information to answer questions.