-
NLVR2 and OKVQA-S
NLVR2 is a challenging VQA dataset that requires the model to compare, locate, and count objects based on the given question and images. OKVQA-S is a challenging category of... -
Mixture of Rationales (MoR) for Visual Question Answering
Zero-shot visual question answering (VQA) is a challenging task that requires reasoning across modalities. While some existing methods rely on a single rationale within the...