Machine Translation - Groups

COMET: A neural framework for MT evaluation

The COMET dataset contains human-annotated scores for machine translation candidates.

Dataset
JSON

WMT2020 Metrics Shared Task

The WMT2020 Metrics Shared Task dataset contains human-annotated scores for machine translation candidates.

Dataset
JSON

RoBLEURT Submission for the WMT2021 Metrics Task

RoBLEURT is a robustly optimizing the training of BLEURT, a trainable metric model for evaluating the semantic consistency between machine translation candidates and golden...

Dataset
JSON

Umsuka English-isiZulu Parallel Corpus

The Umsuka English-isiZulu Parallel Corpus provides a novel, high-quality parallel dataset for machine translation, containing English sentences sampled from both News Crawl...

Dataset
JSON

WAT2015

The dataset used in the paper is the WAT2015 translation task from Japanese (ja) to/from English (en) and Chinese (zh).

Dataset
JSON

WMT’14 English-French and WMT’19 German-English datasets

Two types of datasets: traditional bilingual and domain adaptation datasets.

Dataset
JSON

WMT 2020 Sentence-Level Direct Assessment dataset

The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of...

Dataset
JSON

English-Hindi Parallel Corpus

The dataset used for training and testing the machine translation systems.

Dataset
JSON

English-Hindi Outputs Quality Estimation using Naive Bayes Classifier

The dataset used for training and testing the Naive Bayes classifier for quality estimation of English-Hindi outputs.

Dataset
JSON

Newstest2012 and Newstest2013

Newstest2012 and Newstest2013 are used for testing the proposed approach.

Dataset
JSON

WMT2021 Shared Task on Machine Translation Using Terminologies

The dataset used in this paper is the WMT2021 shared task on machine translation using terminologies, which consists of 4.53M sentence pairs.

Dataset
JSON

WMT2014 German-English Translation Task

The dataset used in this paper is the WMT2014 German-English translation task, which consists of 4.51M parallel sentence pairs.

Dataset
JSON

Shifts Machine Translation dataset

The Shifts Machine Translation dataset consists of pairs of source and target sentences in English and Russian.

Dataset
JSON

ParCor Dataset

The ParCor dataset is a parallel corpus of annotated pronouns.

Dataset
JSON

WIT3 Parallel Corpus

The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks.

Dataset
JSON

WMT 2016 Task on Cross-Lingual Pronoun Prediction

The WMT 2016 task on cross-lingual pronoun prediction is a classiﬁcation task in which participants are asked to provide predictions on what pronoun class label should replace a...

Dataset
JSON

WMT 2015

The dataset used for building the NMT model, which is a German-to-English parallel corpus.

Dataset
JSON

WMT dataset

The dataset used in the paper is the WMT dataset, which contains machine translation data for various language pairs.

Dataset
JSON

WMT’17 metrics task

The dataset used in the paper for validation studies of automatic metrics in natural language generation evaluation

Dataset
JSON

Recurrent Continuous Translation Models

A neural machine translation toolkit that uses maximum likelihood as the training criterion.

Dataset
JSON

106 datasets found