OSCAR Dataset

The dataset used in the paper is a large corpus of real-world programs for pre-training a neural network model to learn better code representation.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Dinglan Peng, Shuxin Zheng, Yatao Li, Guolin Ke, Di He, Tie-Yan Liu (2024). Dataset: OSCAR Dataset. https://doi.org/10.57702/gp2sfidn

DOI retrieved: December 2, 2024

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Defined In	https://doi.org/10.48550/arXiv.2105.04297
Author	Dinglan Peng
More Authors	Shuxin Zheng Yatao Li Guolin Ke Di He Tie-Yan Liu
Homepage	https://github.com/pdlan/OSCAR