Pile

The Pile dataset consists of 800GB text from 22 domains. Cynical selection naturally prefers text data based on the target corpus.

BibTex: