Machine Translation - Groups

Generated Template Sentences for Same-Gender Relationships

Generated template sentences for a variety of relationships in French, Italian, and Spanish, using the format “OCCUPATION RELATIONSHIP-VERB RELATIONSHIP-TARGET.” HIS/HER

Dataset
JSON

English to Hebrew Transliteration

The dataset used for transliterating person names from English to Hebrew, supporting both backward transliteration of Hebrew names and Sideways Transliteration of Arabic names.

Dataset
JSON

IWSLT 2014 Shared Task Dataset

The IWSLT 2014 shared task dataset contains 152K, 156K, 141K and 172K training sentences for the de-en, zh-en, en-tr and en-es language pairs, respectively.

Dataset
JSON

Xlnet: Generalized Autoregressive Pretraining for Language Understanding

The Xlnet is a generalized autoregressive pretraining model for language understanding.

Dataset
JSON

Roberta: A Robustly Optimized BERT Pre-training Approach

Robert is a robustly optimized BERT pre-training approach.

Dataset
JSON

MARGE: A Pre-trained Sequence-to-Sequence Model for Multi-lingual Paraphrasing

MARGE is a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective.

Dataset
JSON

WMT'15

Character-level neural machine translation (NMT) dataset for English to German, English to Czech and English to Finnish language pairs

Dataset
JSON

OPUS-100

The dataset used in the paper is a subset of the OPUS-MT dataset, containing 1M randomly sampled examples from the OPUS-100 dataset.

Dataset
JSON

LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction

LNMap: Departures from isomorphic assumption in bilingual lexicon induction through non-linear mapping in latent space.

Dataset
JSON

Learning Principled Bilingual Word Embeddings

Learning principled bilingual mappings of word embeddings while preserving monolingual invariance.

Dataset
JSON

RAPO: An Adaptive Ranking Paradigm for Bilingual Lexicon Induction

Bilingual lexicon induction induces the word translations by aligning independently trained word embeddings in two languages.

Dataset
JSON

NIST Chinese-English

The dataset used for the experiments of simultaneous neural machine translation.

Dataset
JSON

WMT15 English-German

The dataset used for the experiments of simultaneous neural machine translation.

Dataset
JSON

IWSLT16 German-English

The dataset used for the experiments of simultaneous neural machine translation.

Dataset
JSON

WMT17 Zh-En

Non-autoregressive machine translation dataset

Dataset
JSON

WMT14 En-De

The WMT14 En-De dataset contains 4.5M pairs of English and German sentences.

Dataset
JSON

newstest2019.orig-en.p

The paraphrased reference translations used for the experiments in the paper.

Dataset
JSON

newstest2018.orig-en.p

The paraphrased reference translations used for the experiments in the paper.

Dataset
JSON

WMT 2019 English-German news translation task

The dataset used for the experiments in the paper, containing English-German news translation task.

Dataset
JSON

Vakyansh

The dataset is used for training and testing the proposed punctuation restoration and inverse text normalization models.

Dataset
JSON

148 datasets found