Dataset - LDM

Piazza QA dataset

A dataset of 50 question-answer pairs from a programming languages course at a large public university.
- Dataset
- JSON
Various Datasets

The datasets used in the paper are described as follows: WikiMIA, BookMIA, Temporal Wiki, Temporal arXiv, ArXiv-1 month, Multi-Webdata, LAION-MI, Gutenberg.
- Dataset
- JSON
POJ-104 Dataset

The POJ-104 dataset is a collection of 104 program classes written by 500 different people randomly selected per class.
- Dataset
- JSON
OSCAR Dataset

The dataset used in the paper is a large corpus of real-world programs for pre-training a neural network model to learn better code representation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

4 datasets found