PANDA (Pedantic ANswer-correctness Determination and Adjudication)

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current answer correctness (AC) metrics do not align with human judgments, particularly verbose, free-form answers from large language models.

BibTex: