-
OpenSubtitles2018
This dataset is used to evaluate the performance of context-aware machine translation systems. It consists of English-Russian subtitles with varying levels of context. -
Wikipedia Comparable Corpora
Multilingual dataset for topic modeling based on aligned Wikipedia articles extracted from Wikipedia Comparable Corpora