Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

doi:doi:10.57702/f4746e2g

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

Followers: 0

Organization

No Organization

There is no description for this organization

License

No License Provided

Export

DCAT(rdf/xml) DCAT(xml) DCAT(N3) DCAT(ttl) DCAT(jsonld) DataCite CSL DublinCore BibTex

Rethinking Multi-Modal Alignment in Multi-Choice VideoQA from Feature and Sample Perspectives

Reasoning about causal and temporal event relations in videos is a new destination of Video Question Answering (VideoQA). The major stumbling block to achieve this purpose is the semantic gap between language and video since they are at different levels of abstraction.

BibTex:

Before browse our site, please accept our cookies policy