Machine Translation - Groups

XNMT: The eXtensible Neural Machine Translation Toolkit

XNMT is a neural machine translation toolkit that focuses on modular code design, making it easy to swap in and out different parts of the model.
- Dataset
- JSON
MADAR dataset

The MADAR dataset is a parallel corpus for low-resource languages.
- Dataset
- JSON
ArzEnSEG corpus

The ArzEnSEG corpus is a morphologically annotated dataset for code-switched Egyptian Arabic-English.
- Dataset
- JSON
ArzEn parallel corpus

The ArzEn parallel corpus consists of speech transcriptions gathered through informal interviews with bilingual Egyptian Arabic-English speakers, as well as their English...
- Dataset
- JSON
English-to-Chinese Controlled Machine Translation

The dataset for English-to-Chinese controlled machine translation.
- Dataset
- JSON
Chinese-to-English Controlled Machine Translation

The dataset for Chinese-to-English controlled machine translation.
- Dataset
- JSON
English Controlled Machine Translation

The dataset for English controlled machine translation.
- Dataset
- JSON
IWSLT 2017

The dataset used in the paper is a collection of text for machine translation, using a single machine translation system for multiple language directions.
- Dataset
- JSON
WMT 2023

Findings of the 2023 conference on machine translation (WMT23)
- Dataset
- JSON
WMT 2023 Metrics Shared Task

Findings of the WMT 2023 shared task on automatic post-editing
- Dataset
- JSON
XTOWER

A multilingual LLM for explaining and correcting translation errors
- Dataset
- JSON
Europarl English Romanian dataset

Europarl English Romanian dataset.
- Dataset
- JSON
IWSLT Vietnamese→English and ACL Romanian→English datasets

IWSLT Vietnamese→English and ACL Romanian→English datasets.
- Dataset
- JSON
French-English Translation Task

The dataset used in the paper is a French-English translation task.
- Dataset
- JSON
Vietnamese Diacritic Restoration Dataset

The dataset used for Vietnamese diacritic restoration problem, consisting of 180,000 sentence pairs.
- Dataset
- JSON
IWSLT 2014 English-to-Turkish

English-to-Turkish task of the IWSLT 2014 dataset
- Dataset
- JSON
IWSLT 2014 English-to-Portuguese

English-to-Portuguese task of the IWSLT 2014 dataset
- Dataset
- JSON
IWSLT 2014 English-to-German

English-to-German task of the IWSLT 2014 dataset
- Dataset
- JSON
WMT 2010 and WMT 2012 datasets

The dataset used in the paper is WMT 2010 and WMT 2012 datasets, which contain machine translation tasks.
- Dataset
- JSON
Zh-En Multi-Domain Dataset

The Zh-En multi-domain dataset consists of four balanced domains: news, patent, subtitles, and COVID-19.
- Dataset
- JSON

106 datasets found