GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.

BibTex: