-
Umsuka English-isiZulu Parallel Corpus
The Umsuka English-isiZulu Parallel Corpus provides a novel, high-quality parallel dataset for machine translation, containing English sentences sampled from both News Crawl... -
Shifts Machine Translation dataset
The Shifts Machine Translation dataset consists of pairs of source and target sentences in English and Russian. -
ParCor Dataset
The ParCor dataset is a parallel corpus of annotated pronouns. -
WIT3 Parallel Corpus
The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks. -
WMT 2016 Task on Cross-Lingual Pronoun Prediction
The WMT 2016 task on cross-lingual pronoun prediction is a classification task in which participants are asked to provide predictions on what pronoun class label should replace a... -
MADAR dataset
The MADAR dataset is a parallel corpus for low-resource languages. -
ArzEnSEG corpus
The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English. -
ArzEn parallel corpus
The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English... -
IWSLT 2017
The dataset used in the paper is a collection of text for machine translation, using a single machine translation system for multiple language directions. -
French-English Translation Task
The dataset used in the paper is a French-English translation task. -
Zh-En Multi-Domain Dataset
The Zh-En multi-domain dataset consists of four balanced domains: news, patent, subtitles, and COVID-19. -
Machine Translation Datasets
The dataset used in the paper is a collection of adversarial examples and natural examples for machine translation tasks. -
OpenSubtitles2018
This dataset is used to evaluate the performance of context-aware machine translation systems. It consists of English-Russian subtitles with varying levels of context. -
Bleu: a method for automatic evaluation of machine translation
Bleu: a method for automatic evaluation of machine translation. -
IWSLT 2014
The IWSLT 2014 German-to-English dataset is a machine translation dataset, containing 153K sentence pairs. -
United Nations Parallel Corpus
High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations. -
Yiyan Corpus
High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations. -
English-Chinese Books
High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations.