-
Umsuka English-isiZulu Parallel Corpus
The Umsuka English-isiZulu Parallel Corpus provides a novel, high-quality parallel dataset for machine translation, containing English sentences sampled from both News Crawl... -
ParCor Dataset
The ParCor dataset is a parallel corpus of annotated pronouns. -
WIT3 Parallel Corpus
The WIT3 parallel corpus is a large-scale corpus of transcribed and translated talks. -
Europarl parallel corpus
The dataset used in this paper is a multi-view dataset, where each view is a matrix of size I x K, with I being the number of entities and K being the number of features. The... -
Watchtower corpus (WTC)
The dataset used in this paper is a multilingual parallel corpus, specifically the Watchtower corpus (WTC), which is a collection of multilingual sentences. -
DGT corpus
The dataset is a parallel corpus of aligned sentences across nine languages (36 language pairs) from the DGT corpus, used for language comparison experiments.