Multimodal Robustness Benchmark

The MMR benchmark is designed to evaluate MLLMs' comprehension of visual content and robustness against misleading questions, ensuring models truly leverage multimodal inputs rather than relying solely on textual reasoning. The MMR-data is generated to enhance MLLMs' understanding capability and robustness by providing a training set with paired positive and negative visual question-answer samples.

BibTex: