TL;DR Reddit corpus

doi:doi:10.57702/2heldboi

You're currently viewing an old version of this dataset. To see the current version, click here.

TL;DR Reddit corpus

The TL;DR Reddit corpus consists of approximately 3 million content-summary pairs mined from Reddit, used for training and evaluating summarization models.

Data and Resources

This dataset has no data

Cite this as

Michael Völske, Martin Potthast, Shahbaz Syed, Benno Stein (2024). Dataset: TL;DR Reddit corpus. https://doi.org/10.57702/2heldboi

DOI retrieved: November 25, 2024

Additional Info

Field	Value
Created	November 25, 2024
Last update	November 25, 2024
Defined In	https://doi.org/10.1016/j.websem.2020.100596
Citation	https://doi.org/10.1016/j.websem.2022.100755
Author	Michael Völske
More Authors	Martin Potthast Shahbaz Syed Benno Stein