ClinicalLab: A Comprehensive Clinical Diagnosis Agent Alignment Suite

Large language models (LLMs) have achieved significant performance progress in various natural language processing applications. However, LLMs still struggle to meet the strict requirements for accuracy and reliability in the medical field and face many challenges in clinical applications. Existing clinical diagnostic evaluation benchmarks for evaluating medical agents powered by LLMs have severe limitations. Firstly, most existing medical evaluation benchmarks face the risk of data leakage or contamination. Secondly, existing benchmarks often neglect the characteristics of multiple departments and specializations in modern medical practice. Thirdly, existing evaluation methods are limited to multiple-choice questions, which do not align with the real-world diagnostic scenarios. Lastly, existing evaluation methods lack comprehensive evaluations of end-to-end real clinical scenarios. These limitations in benchmarks in turn obstruct advancements of LLMs and agents for medicine. To address these limitations, we introduce ClinicalLab, a comprehensive clinical diagnosis agent alignment suite, including ClinicalBench, ClinicalMetrics, and ClinicalAgent, to promote development of clinical diagnostic agents.

Data and Resources

Cite this as

Weixiang Yan, Haitian Liu, Tengxiao Wu, Qian Chen, Wen Wang, Haoyuan Chai, Jiayi Wang, Weishan Zhao, Yixin Zhang, Renjun Zhang, Li Zhu (2024). Dataset: ClinicalLab: A Comprehensive Clinical Diagnosis Agent Alignment Suite. https://doi.org/10.57702/bo4xy1bt

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Defined In https://doi.org/10.48550/arXiv.2406.13890
Author Weixiang Yan
More Authors
Haitian Liu
Tengxiao Wu
Qian Chen
Wen Wang
Haoyuan Chai
Jiayi Wang
Weishan Zhao
Yixin Zhang
Renjun Zhang
Li Zhu