WMT 2020 Sentence-Level Direct Assessment dataset

doi:doi:10.57702/xn8xolfq

WMT 2020 Sentence-Level Direct Assessment dataset

The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of high-resource languages English-German (En-De) and English-Chinese (En-Zh), medium-resource languages Romanian-English (Ro-En) and Estonian-English (Et-En), and low-resource languages Sinhala-English (Si-En) and Nepalese-English (Ne-En), as well as a Russian-English (Ru-En) dataset which combines articles from Wikipedia and Reddit.

BibTex:

@dataset{Tharindu_Ranasinghe_and_Constantin_Or˘asan_and_Ruslan_Mitkov_2024,
    abstract = {The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of high-resource languages English-German (En-De) and English-Chinese (En-Zh), medium-resource languages Romanian-English (Ro-En) and Estonian-English (Et-En), and low-resource languages Sinhala-English (Si-En) and Nepalese-English (Ne-En), as well as a Russian-English (Ru-En) dataset which combines articles from Wikipedia and Reddit.},
    author = {Tharindu Ranasinghe and Constantin Or˘asan and Ruslan Mitkov},
    doi = {10.57702/xn8xolfq},
    institution = {No Organization},
    keyword = {'Quality Estimation', 'Translation', 'Wikipedia'},
    month = {dec},
    publisher = {TIB},
    title = {WMT 2020 Sentence-Level Direct Assessment dataset},
    url = {https://service.tib.eu/ldmservice/dataset/wmt-2020-sentence-level-direct-assessment-dataset},
    year = {2024}
}