3 datasets found

Tags: low-resource languages

Filter Results
  • Umsuka English-isiZulu Parallel Corpus

    The Umsuka English-isiZulu Parallel Corpus provides a novel, high-quality parallel dataset for machine translation, containing English sentences sampled from both News Crawl...
  • MADAR dataset

    The MADAR dataset is a parallel corpus for low-resource languages.
  • MASSIVE

    The MASSIVE dataset is a comprehensive collection of approximately one million annotated utterances for various natural language understanding tasks such as slot-filling, intent...
You can also access this registry using the API (see API Docs).