-
No language left behind: Scaling human-centered machine translation
The dataset is used for training and testing the performance of multilingual language models. -
WMT'14 English-German, WAT'17 Japanese-English, and WMT'17 Chinese-English tr...
The dataset used in the paper is WMT'14 English-German, WAT'17 Japanese-English, and WMT'17 Chinese-English translation tasks. -
WMT14 English-French
The dataset used for bilingual resynchronization task, which includes WMT14 English-French data and the small parallel sentence compression dataset. -
Bilingual Synchronization
The dataset used for bilingual synchronization task, which includes simulated interactive MT, translating with Translation Memory (TM) and TM cleaning. -
Diabla: A Corpus of Bilingual Spontaneous Written Dialogues
A corpus of bilingual spontaneous written dialogues for machine translation. -
Various Machine Translation datasets
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used various datasets for machine translation tasks. -
Moses Toolkit dataset
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the Moses toolkit to tokenize sentences and split words into subword units. -
IT, Koran, Medical, and Law datasets
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used four commonly-used benchmarks, including IT, Koran, Medical, and Law. -
IWSLT 2014 Shared Task Dataset
The IWSLT 2014 shared task dataset contains 152K, 156K, 141K and 172K training sentences for the de-en, zh-en, en-tr and en-es language pairs, respectively. -
WMT17 Zh-En
Non-autoregressive machine translation dataset -
WMT14 En-De
The WMT14 En-De dataset contains 4.5M pairs of English and German sentences. -
newstest2019.orig-en.p
The paraphrased reference translations used for the experiments in the paper. -
newstest2018.orig-en.p
The paraphrased reference translations used for the experiments in the paper. -
WMT 2019 English-German news translation task
The dataset used for the experiments in the paper, containing English-German news translation task. -
IWSLT 2015 English-Vietnamese
The IWSLT 2015 English-Vietnamese language data set, which has around 133k training sentence pairs. -
COMET: A neural framework for MT evaluation
The COMET dataset contains human-annotated scores for machine translation candidates.