-
No language left behind: Scaling human-centered machine translation
The dataset is used for training and testing the performance of multilingual language models. -
WMT'14 English-German, WAT'17 Japanese-English, and WMT'17 Chinese-English tr...
The dataset used in the paper is WMT'14 English-German, WAT'17 Japanese-English, and WMT'17 Chinese-English translation tasks. -
WMT14 English-French
The dataset used for bilingual resynchronization task, which includes WMT14 English-French data and the small parallel sentence compression dataset. -
Bilingual Synchronization
The dataset used for bilingual synchronization task, which includes simulated interactive MT, translating with Translation Memory (TM) and TM cleaning. -
TED Polish-to-English translation system
The dataset is used to test the proposed methodologies for mining parallel data from comparable corpora. -
Mining parallel data from comparable corpora
This research explores new methodologies for mining parallel data from previously obtained comparable corpora. -
Neural machine translation by jointly learning to align and translate
Neural machine translation by jointly learning to align and translate. -
Diabla: A Corpus of Bilingual Spontaneous Written Dialogues
A corpus of bilingual spontaneous written dialogues for machine translation. -
Various Machine Translation datasets
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used various datasets for machine translation tasks. -
Moses Toolkit dataset
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the Moses toolkit to tokenize sentences and split words into subword units. -
IT, Koran, Medical, and Law datasets
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used four commonly-used benchmarks, including IT, Koran, Medical, and Law. -
Moses: Open Source Toolkit for Statistical Machine Translation
Moses: Open source toolkit for statistical machine translation. -
Continuous Space Translation Models for Phrase-Based Statistical Machine Tran...
Continuous space translation models for phrase-based statistical machine translation. -
Deterministic Non-Autoregressive Neural Sequence Modeling
The proposed model is designed based on the principles of latent variable models and denoising autoencoders, and is generally applicable to any sequence generation task. -
Multilingual Data Set for Same-Gender Relationships
Multilingual data set of sentence templates for a variety of relationships in French, Italian, and Spanish