Code Evaluation - Groups

LiveCodeBench

LiveCodeBench is a benchmark for evaluating the performance of Large Language Models (LLMs) in code editing tasks, including debugging, translating, polishing, and requirement...
- Dataset
- JSON
Evol-Instruct-Code-80k

Evol-Instruct-Code-80k is a dataset for evaluating the performance of code generation models.
- Dataset
- JSON
HumanEvalFix

HumanEvalFix is a benchmark for evaluating the performance of code repair models.
- Dataset
- JSON

3 datasets found