Evaluating large language models trained on code

doi:doi:10.57702/wbv3e61b

Evaluating large language models trained on code

The paper presents the results of the OpenAI Codex evaluation on generating Python code.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman (2024). Dataset: Evaluating large language models trained on code. https://doi.org/10.57702/wbv3e61b

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2407.15343
Citation	https://doi.org/10.48550/arXiv.2205.07634
Author	Mark Chen
More Authors	Jerry Tworek Heewoo Jun Qiming Yuan Henrique Ponde de Oliveira Pinto Jared Kaplan Harri Edwards Yuri Burda Nicholas Joseph Greg Brockman
Homepage	https://arxiv.org/abs/2107.03374