Learning to Evaluate Image Captioning

doi:doi:10.57702/kjptkcj4

Learning to Evaluate Image Captioning

Evaluation metrics for image captioning face two challenges. Firstly, commonly used metrics such as CIDEr, METEOR, ROUGE and BLEU often do not correlate well with human judgments. Secondly, each metric has well known blind spots to pathological caption constructions, and rule-based metrics lack provisions to repair such blind spots once identified.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie (2024). Dataset: Learning to Evaluate Image Captioning. https://doi.org/10.57702/kjptkcj4

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.1806.06422
Author	Yin Cui
More Authors	Guandao Yang Andreas Veit Xun Huang Serge Belongie
Homepage	https://arxiv.org/abs/1805.11575