3 datasets found

Organizations: No Organization

Filter Results
  • OpenSubtitles2018

    This dataset is used to evaluate the performance of context-aware machine translation systems. It consists of English-Russian subtitles with varying levels of context.
  • Wikipedia Comparable Corpora

    Multilingual dataset for topic modeling based on aligned Wikipedia articles extracted from Wikipedia Comparable Corpora
  • OSCAR

    The OSCAR corpus is a multilingual web corpus that is used for pre-training large generative language models. It is a document-oriented corpus that is comparable in size and...