-
Turkish-English and Uyghur-Chinese machine translation tasks
The dataset used in the paper is the Turkish-English and Uyghur-Chinese machine translation tasks. -
WMT22 Translation Suggestion Task
WMT22 Shared Task on Translation Suggestion (TS) dataset. -
WMT datasets
WMT datasets are large-scale machine translation datasets. -
LDC 2002 English-Chinese Dataset
The LDC 2002 English-Chinese dataset is used for testing the proposed approach. -
WMT 2016 English-German Dataset
The WMT 2016 English-German dataset is used for testing the proposed approach. -
WMT 2014 English-French Dataset
The WMT 2014 English-French dataset is used for testing the proposed approach. -
IWSLT'14 German-English Translation Dataset
The dataset contains 160K sentence pairs for German-English translation. -
WMT17 Chinese-English Translation Dataset
The dataset contains 20M sentence pairs for Chinese-English translation. -
IWSLT 2014
The IWSLT 2014 German-to-English dataset is a machine translation dataset, containing 153K sentence pairs. -
Workshop of Machine Translation 2018
The Workshop of Machine Translation 2018 dataset is used to train the text machine translation models. -
WMT 2014 English-German
The dataset used in the paper is WMT 2014 English-German dataset, which is a machine translation dataset. -
United Nations Parallel Corpus
High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers. -
IWSLT14 EN→DE, WMT14 EN→DE, WMT16 EN→DE
The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used the IWSLT14 EN→DE task, WMT14 EN→DE task, and WMT16 EN→DE task. -
WMT’16 English-Romanian dataset
The WMT’16 English-Romanian dataset was used for machine translation task.