Audio-Visual Question Answering

Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer.

BibTex: