7 datasets found

Tags: multilingualism

Filter Results
  • FLORES-101 Evaluation Benchmark

    A machine translation benchmark for low-resource and multilingual machine translation.
  • Europarl-ST

    Europarl-ST is a multilingual speech corpus that contains transcriptions of parliamentary debates in multiple languages.
  • MASSIVE

    The MASSIVE dataset is a comprehensive collection of approximately one million annotated utterances for various natural language understanding tasks such as slot-filling, intent...
  • WikiANN

    The WikiANN dataset is a multilingual dataset for named entity recognition.
  • MuST-C: a Multilingual Speech Translation Corpus

    MuST-C is a multilingual speech translation corpus.
  • XED

    The dataset used in the paper is a multilingual dataset for sentiment analysis, specifically the XED dataset.
  • MuST-C

    MuST-C is a multilingual speech translation dataset, which contains at least 385 hours of audio recordings from TED Talks, with their manual transcriptions and translations at...
You can also access this registry using the API (see API Docs).