-
LiveCodeBench
LiveCodeBench is a benchmark for evaluating the performance of Large Language Models (LLMs) in code editing tasks, including debugging, translating, polishing, and requirement... -
Evol-Instruct-Code-80k
Evol-Instruct-Code-80k is a dataset for evaluating the performance of code generation models. -
HumanEvalFix
HumanEvalFix is a benchmark for evaluating the performance of code repair models.