DocRepair dataset

The dataset used for testing the DocRepair model, containing 30m groups of 4 consecutive sentences in English and Russian.

Data and Resources

Cite this as

Elena Voita, Rico Sennrich, Ivan Titov (2024). Dataset: DocRepair dataset. https://doi.org/10.57702/3qeel4sd

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1909.01383
Author Elena Voita
More Authors
Rico Sennrich
Ivan Titov
Homepage https://github.com/lena-voita/DocRepair