1 dataset found

Tags: code evaluation

Filter Results
  • LiveCodeBench

    LiveCodeBench is a benchmark for evaluating the performance of Large Language Models (LLMs) in code editing tasks, including debugging, translating, polishing, and requirement...