-
Piazza QA dataset
A dataset of 50 question-answer pairs from a programming languages course at a large public university. -
Various Datasets
The datasets used in the paper are described as follows: WikiMIA, BookMIA, Temporal Wiki, Temporal arXiv, ArXiv-1 month, Multi-Webdata, LAION-MI, Gutenberg. -
POJ-104 Dataset
The POJ-104 dataset is a collection of 104 program classes written by 500 different people randomly selected per class. -
OSCAR Dataset
The dataset used in the paper is a large corpus of real-world programs for pre-training a neural network model to learn better code representation.