WikiSplit

A large dataset of naturally occurring sentence rewrites from Wikipedia edit history, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus.

Data and Resources

Cite this as

Christina Niklaus, Matthias Cetto, Andr´e Freitas, Siegfried Handschuh (2024). Dataset: WikiSplit. https://doi.org/10.57702/io84pkq6

DOI retrieved: December 2, 2024

Additional Info

Field Value
Created December 2, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2308.00425
Citation
  • https://doi.org/10.48550/arXiv.1808.09468
Author Christina Niklaus
More Authors
Matthias Cetto
Andr´e Freitas
Siegfried Handschuh
Homepage https://github.com/Lambda-3/DiscourseSimplification