Evaluating large language models trained on code

The paper presents the results of the OpenAI Codex evaluation on generating Python code.

Data and Resources

Cite this as

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman (2024). Dataset: Evaluating large language models trained on code. https://doi.org/10.57702/wbv3e61b

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2407.15343
Citation
  • https://doi.org/10.48550/arXiv.2205.07634
Author Mark Chen
More Authors
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Ponde de Oliveira Pinto
Jared Kaplan
Harri Edwards
Yuri Burda
Nicholas Joseph
Greg Brockman
Homepage https://arxiv.org/abs/2107.03374