-
IWSLT Vietnamese→English and ACL Romanian→English datasets
IWSLT Vietnamese→English and ACL Romanian→English datasets. -
Vietnamese Diacritic Restoration Dataset
The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs. -
IWSLT 2014 English-to-Turkish
English-to-Turkish task of the IWSLT 2014 dataset -
IWSLT 2014 English-to-Portuguese
English-to-Portuguese task of the IWSLT 2014 dataset -
IWSLT 2014 English-to-German
English-to-German task of the IWSLT 2014 dataset -
WMT 2010 and WMT 2012 datasets
The dataset used in the paper is WMT 2010 and WMT 2012 datasets, which contain machine translation tasks. -
Machine Translation and Automated Analysis of the Sumerian Language Dataset
The Machine Translation and Automated Analysis of the Sumerian Language dataset, which contains Sumerian texts in cuneiform script. -
WMT18 data
The dataset used in the paper is the WMT18 data. -
Eu-En: Basque-English dataset
The Basque-English dataset (eu-en) has been collected from the WMT16 IT-domain translation shared task. -
Cs-En: Czech-English dataset
The Czech-English dataset (cs-en) is also from the IWSLT 2016 TED talks translation task. -
En-Fr: English-French dataset
The English-French dataset (en-fr) has been sourced from the IWSLT 2016 translation shared task. -
De-En: German-English dataset
Four different language pairs have been selected for the experiments. The datasets' size varies from tens of thousands to millions of sentences to test the regularizers' ability... -
DocRepair dataset
The dataset used for testing the DocRepair model, containing 30m groups of 4 consecutive sentences in English and Russian. -
WMT 2014 English-German task
The dataset used for the Second Workshop on Neural Machine Translation and Generation -
IWSLT2014 dataset
Tatoeba and IWSLT2014 datasets for machine translation.