3 datasets found

Groups: Hallucination Evaluation

Filter Results
  • HallusionBench

    HallusionBench is an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models.
  • VALOR-BENCH

    VALOR-BENCH is a comprehensive human-annotated dataset covering hallucinations in large vision-language models, with a focus on measuring hallucinations in generative tasks.
  • HaluEval-Sum

    The dataset used in this paper is HaluEval-Sum, a large-scale hallucination evaluation benchmark for large language models.