6 datasets found

Tags: Parallel Corpus

Filter Results
  • English-Hindi Parallel Corpus

    The dataset used for training and testing the machine translation systems.
  • OpenSubtitles2018

    This dataset is used to evaluate the performance of context-aware machine translation systems. It consists of English-Russian subtitles with varying levels of context.
  • IWSLT17

    The IWSLT17 dataset is a multilingual parallel corpus of 5 languages.
  • PC32

    The dataset is a multilingual parallel corpus of 32 English-centric language pairs.
  • WikiMatrix

    The WikiMatrix dataset is a multilingual dataset that contains parallel texts between English and other languages.
  • United Nations Parallel Corpus

    High-quality human translations from books, leveraging the induction bias that high-quality human translations are superior to machine-generated translations.
You can also access this registry using the API (see API Docs).