HumanEval, MBPP, APPS
The dataset used in the paper is a code generation benchmark, consisting of 164 function declarations alongside their documentation, 500 test examples, each one is an instruction for a code function, and 5k programming problems at various levels of difficulty.
BibTex: