You're currently viewing an old version of this dataset. To see the current version, click here.

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified.

Data and Resources

This dataset has no data

Cite this as

Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie (2024). Dataset: Learning to Evaluate Image Captioning. https://doi.org/10.57702/kjptkcj4

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1806.06422
Author Yin Cui
More Authors
Guandao Yang
Andreas Veit
Xun Huang
Serge Belongie
Homepage https://arxiv.org/abs/1805.11575