IMAGINE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation

Automatic evaluations for natural language generation (NLG) conventionally rely on token-level or embedding-level comparisons with the text references. This is different from human language processing, for which visual imagination often improves comprehension.

BibTex: