Dataset - LDM

PLUM: Preference Learning Plus Test Cases Yields Better Code Language Models

Instruction-finetuned code language models have shown promise in various programming tasks. They are trained, using a language modeling objective, on natural language...
- Dataset
- JSON
Search-based Pseudocode to Code

The SPoC dataset contains 18,356 C++ programs with human-authored pseudocode and test cases.
- Dataset
- JSON
MBPP dataset

The dataset used in this paper is the MBPP dataset, which contains code snippets and their corresponding test cases.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

3 datasets found