arXiv dataset

The dataset used in this paper is a collection of arXiv papers in English, filtered to include only those written in English, with LATEX source available, compilable on a modern LATEX distribution, and containing at least a theorem or a proof environment.

BibTex: