WMT 2020 Sentence-Level Direct Assessment dataset

doi:doi:10.57702/xn8xolfq

WMT 2020 Sentence-Level Direct Assessment dataset

The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of high-resource languages English-German (En-De) and English-Chinese (En-Zh), medium-resource languages Romanian-English (Ro-En) and Estonian-English (Et-En), and low-resource languages Sinhala-English (Si-En) and Nepalese-English (Ne-En), as well as a Russian-English (Ru-En) dataset which combines articles from Wikipedia and Reddit.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Tharindu Ranasinghe, Constantin Or˘asan, Ruslan Mitkov (2024). Dataset: WMT 2020 Sentence-Level Direct Assessment dataset. https://doi.org/10.57702/xn8xolfq

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2010.05318
Author	Tharindu Ranasinghe
More Authors	Constantin Or˘asan Ruslan Mitkov