-
COMET: A neural framework for MT evaluation
The COMET dataset contains human-annotated scores for machine translation candidates. -
WMT2020 Metrics Shared Task
The WMT2020 Metrics Shared Task dataset contains human-annotated scores for machine translation candidates. -
RoBLEURT Submission for the WMT2021 Metrics Task
RoBLEURT is a robustly optimizing the training of BLEURT, a trainable metric model for evaluating the semantic consistency between machine translation candidates and golden... -
WMT’14 English-French and WMT’19 German-English datasets
Two types of datasets: traditional bilingual and domain adaptation datasets. -
English-Hindi Parallel Corpus
The dataset used for training and testing the machine translation systems. -
English-Hindi Outputs Quality Estimation using Naive Bayes Classifier
The dataset used for training and testing the Naive Bayes classifier for quality estimation of English-Hindi outputs. -
Newstest2012 and Newstest2013
Newstest2012 and Newstest2013 are used for testing the proposed approach. -
WMT2021 Shared Task on Machine Translation Using Terminologies
The dataset used in this paper is the WMT2021 shared task on machine translation using terminologies, which consists of 4.53M sentence pairs. -
WMT dataset
The dataset used in the paper is the WMT dataset, which contains machine translation data for various language pairs. -
English-to-Chinese Controlled Machine Translation
The dataset for English-to-Chinese controlled machine translation. -
Chinese-to-English Controlled Machine Translation
The dataset for Chinese-to-English controlled machine translation. -
English Controlled Machine Translation
The dataset for English controlled machine translation. -
WMT 2023 Metrics Shared Task
Findings of the WMT 2023 shared task on automatic post-editing -
Europarl English Romanian dataset
Europarl English Romanian dataset. -
IWSLT Vietnamese→English and ACL Romanian→English datasets
IWSLT Vietnamese→English and ACL Romanian→English datasets. -
Vietnamese Diacritic Restoration Dataset
The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs.