WMT 2020 Sentence-Level Direct Assessment dataset

The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of high-resource languages English-German (En-De) and English-Chinese (En-Zh), medium-resource languages Romanian-English (Ro-En) and Estonian-English (Et-En), and low-resource languages Sinhala-English (Si-En) and Nepalese-English (Ne-En), as well as a Russian-English (Ru-En) dataset which combines articles from Wikipedia and Reddit.

Data and Resources

Cite this as

Tharindu Ranasinghe, Constantin Or˘asan, Ruslan Mitkov (2024). Dataset: WMT 2020 Sentence-Level Direct Assessment dataset. https://doi.org/10.57702/xn8xolfq

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2010.05318
Author Tharindu Ranasinghe
More Authors
Constantin Or˘asan
Ruslan Mitkov