Machine Translation - Groups

WAT2015

The dataset used in the paper is the WAT2015 translation task from Japanese (ja) to/from English (en) and Chinese (zh).
- Dataset
- JSON
WMT 2015

The dataset used for building the NMT model, which is a German-to-English parallel corpus.
- Dataset
- JSON
ArzEnSEG corpus

The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English.
- Dataset
- JSON
ArzEn parallel corpus

The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English...
- Dataset
- JSON
English-to-Chinese Controlled Machine Translation

The dataset for English-to-Chinese controlled machine translation.
- Dataset
- JSON
English Controlled Machine Translation

The dataset for English controlled machine translation.
- Dataset
- JSON
WMT 2014 English-German task

The dataset used for the Second Workshop on Neural Machine Translation and Generation
- Dataset
- JSON
KFTT datasets

KFTT English↔Japanese translation datasets.
- Dataset
- JSON
NIST 2003 (MT03), NIST 2004 (MT04), NIST 2005 (MT05), NIST 2006 (MT06) datasets

Chinese↔English translation tasks, KFTT English↔Japanese translation datasets.
- Dataset
- JSON
WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017

The dataset used in the paper is the WIT corpus, SETimes corpus, newsdev2016, newstest2016, and newstest2017.
- Dataset
- JSON
Turkish-English and Uyghur-Chinese machine translation tasks

The dataset used in the paper is the Turkish-English and Uyghur-Chinese machine translation tasks.
- Dataset
- JSON
IWSLT 2014

The IWSLT 2014 German-to-English dataset is a machine translation dataset, containing 153K sentence pairs.
- Dataset
- JSON

12 datasets found