Machine Translation - Groups

COMET: A neural framework for MT evaluation

The COMET dataset contains human-annotated scores for machine translation candidates.
- Dataset
- JSON
WMT2020 Metrics Shared Task

The WMT2020 Metrics Shared Task dataset contains human-annotated scores for machine translation candidates.
- Dataset
- JSON
RoBLEURT Submission for the WMT2021 Metrics Task

RoBLEURT is a robustly optimizing the training of BLEURT, a trainable metric model for evaluating the semantic consistency between machine translation candidates and golden...
- Dataset
- JSON
WAT2015

The dataset used in the paper is the WAT2015 translation task from Japanese (ja) to/from English (en) and Chinese (zh).
- Dataset
- JSON
WMT’14 English-French and WMT’19 German-English datasets

Two types of datasets: traditional bilingual and domain adaptation datasets.
- Dataset
- JSON
English-Hindi Parallel Corpus

The dataset used for training and testing the machine translation systems.
- Dataset
- JSON
English-Hindi Outputs Quality Estimation using Naive Bayes Classifier

The dataset used for training and testing the Naive Bayes classifier for quality estimation of English-Hindi outputs.
- Dataset
- JSON
Newstest2012 and Newstest2013

Newstest2012 and Newstest2013 are used for testing the proposed approach.
- Dataset
- JSON
WMT2021 Shared Task on Machine Translation Using Terminologies

The dataset used in this paper is the WMT2021 shared task on machine translation using terminologies, which consists of 4.53M sentence pairs.
- Dataset
- JSON
WMT 2015

The dataset used for building the NMT model, which is a German-to-English parallel corpus.
- Dataset
- JSON
WMT dataset

The dataset used in the paper is the WMT dataset, which contains machine translation data for various language pairs.
- Dataset
- JSON
English-to-Chinese Controlled Machine Translation

The dataset for English-to-Chinese controlled machine translation.
- Dataset
- JSON
Chinese-to-English Controlled Machine Translation

The dataset for Chinese-to-English controlled machine translation.
- Dataset
- JSON
English Controlled Machine Translation

The dataset for English controlled machine translation.
- Dataset
- JSON
WMT 2023

Findings of the 2023 conference on machine translation (WMT23)
- Dataset
- JSON
WMT 2023 Metrics Shared Task

Findings of the WMT 2023 shared task on automatic post-editing
- Dataset
- JSON
XTOWER

A multilingual LLM for explaining and correcting translation errors
- Dataset
- JSON
Europarl English Romanian dataset

Europarl English Romanian dataset.
- Dataset
- JSON
IWSLT Vietnamese→English and ACL Romanian→English datasets

IWSLT Vietnamese→English and ACL Romanian→English datasets.
- Dataset
- JSON
Vietnamese Diacritic Restoration Dataset

The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs.
- Dataset
- JSON

75 datasets found