You're currently viewing an old version of this dataset. To see the current version, click here.

TL;DR Reddit corpus

The TL;DR Reddit corpus consists of approximately 3 million content-summary pairs mined from Reddit, used for training and evaluating summarization models.

Data and Resources

This dataset has no data

Cite this as

Michael Völske, Martin Potthast, Shahbaz Syed, Benno Stein (2024). Dataset: TL;DR Reddit corpus. https://doi.org/10.57702/2heldboi

DOI retrieved: November 25, 2024

Additional Info

Field Value
Created November 25, 2024
Last update November 25, 2024
Defined In https://doi.org/10.1016/j.websem.2020.100596
Citation
  • https://doi.org/10.1016/j.websem.2022.100755
Author Michael Völske
More Authors
Martin Potthast
Shahbaz Syed
Benno Stein