You're currently viewing an old version of this dataset. To see the current version, click here.

WMT 2020 Sentence-Level Direct Assessment dataset

The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of high-resource languages English-German (En-De) and English-Chinese (En-Zh), medium-resource languages Romanian-English (Ro-En) and Estonian-English (Et-En), and low-resource languages Sinhala-English (Si-En) and Nepalese-English (Ne-En), as well as a Russian-English (Ru-En) dataset which combines articles from Wikipedia and Reddit.

Data and Resources

This dataset has no data

Cite this as

Tharindu Ranasinghe, Constantin Or˘asan, Ruslan Mitkov (2024). Dataset: WMT 2020 Sentence-Level Direct Assessment dataset. https://doi.org/10.57702/xn8xolfq

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2010.05318
Author	Tharindu Ranasinghe
More Authors	Constantin Or˘asan Ruslan Mitkov