-
HallusionBench
HallusionBench is an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models. -
VALOR-BENCH
VALOR-BENCH is a comprehensive human-annotated dataset covering hallucinations in large vision-language models, with a focus on measuring hallucinations in generative tasks. -
TruthfulQA
The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods. -
Detecting Hallucinated Content in Conditional Neural Sequence Generation
Neural sequence models can generate highly fluent sentences, but recent studies have also shown that they are also prone to hallucinate additional content not supported by the...