-
Causal-VidQA
This dataset is used in the paper to evaluate the performance of the TranSTR architecture. -
ActivityNet-QA
Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works...