-
English-Hindi Parallel Corpus
The dataset used for training and testing the machine translation systems. -
OpenSubtitles2018
This dataset is used to evaluate the performance of context-aware machine translation systems. It consists of English-Russian subtitles with varying levels of context. -
WikiMatrix
The WikiMatrix dataset is a multilingual dataset that contains parallel texts between English and other languages. -
United Nations Parallel Corpus
High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations.