You're currently viewing an old version of this dataset. To see the current version, click here.

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA). The major stumbling block to achieve this purpose is the semantic gap between language and video since they are at different levels of abstraction.

Data and Resources

This dataset has no data

Cite this as

Shaoning Xiao, Long Chen, Kaifeng Gao, Zhao Wang, Yi Yang, Zhimeng Zhang, Jun Xiao (2024). Dataset: Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives. https://doi.org/10.57702/f4746e2g

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2204.11544
Author	Shaoning Xiao
More Authors	Long Chen Kaifeng Gao Zhao Wang Yi Yang Zhimeng Zhang Jun Xiao