Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Tags: evaluation Filter Results GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Dataset JSON