-
Wizardcoder
Wizardcoder: Empowering code large language models with evol-instruct -
PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models
Instruction-finetuned code language models have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language... -
MBPP dataset
The dataset used in this paper is the MBPP dataset, which contains code snippets and their corresponding test cases. -
APPS: A Dataset for Code Generation Evaluation
The APPS dataset is a collection of programming problems used to evaluate the performance of code generation models. -
Evaluating large language models trained on code
The paper presents the results of the OpenAI Codex evaluation on generating Python code. -
Execution-based Evaluation for NL2Bash
A set of 50 prompts to evaluate execution-based evaluation for NL2Bash task -
CodeUltraFeedback
CodeUltraFeedback is a preference dataset of 10,000 complex instructions to tune and align LLMs to coding preferences through AI feedback. -
HumanEval, MBPP, APPS
The dataset used in the paper is a code generation benchmark, consisting of 164 function declarations alongside their documentation, 500 test examples, each one is an... -
Large language models of code fail at completing code with potential bugs
Code generation models fail at completing code with potential bugs. -
SLTrans: A Source Code to LLVM IR Translation Pairs Dataset
SLTrans is a parallel dataset consisting of nearly 4M pairs of self-contained source code and corresponding LLVM IR. -
Evol-Instruct-Code-80k
Evol-Instruct-Code-80k is a dataset for evaluating the performance of code generation models.