-
ArzEnSEG corpus
The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English. -
ArzEn parallel corpus
The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English... -
English-to-Chinese Controlled Machine Translation
The dataset for English-to-Chinese controlled machine translation. -
English Controlled Machine Translation
The dataset for English controlled machine translation. -
WMT 2014 English-German task
The dataset used for the Second Workshop on Neural Machine Translation and Generation -
KFTT datasets
KFTT English↔Japanese translation datasets. -
NIST 2003 (MT03), NIST 2004 (MT04), NIST 2005 (MT05), NIST 2006 (MT06) datasets
Chinese↔English translation tasks, KFTT English↔Japanese translation datasets. -
WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017
The dataset used in the paper is the WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017. -
Turkish-English and Uyghur-Chinese machine translation tasks
The dataset used in the paper is the Turkish-English and Uyghur-Chinese machine translation tasks. -
IWSLT 2014
The IWSLT 2014 German-to-English dataset is a machine translation dataset, containing 153K sentence pairs.