Modality-Aware Integration with Large Language Models for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs).

BibTex: