Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified.

Data and Resources

Cite this as

Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie (2024). Dataset: Learning to Evaluate Image Captioning. https://doi.org/10.57702/kjptkcj4

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1806.06422
Author Yin Cui
More Authors
Guandao Yang
Andreas Veit
Xun Huang
Serge Belongie
Homepage https://arxiv.org/abs/1805.11575