M4

The M4 dataset consists of human-written texts from several data sources, including Wikipedia, Reddit, and arXiv in the English subset of the dataset. It pairs the human-written texts with texts generated by several LLMs, including text-davinci-003 (henceforth GPT-3.5), GPT-4, and ChatGPT.

BibTex: