6 datasets found

Tags: Low-Resource Languages

Filter Results
  • Sumerian Cuneiform Dataset

    The dataset used for the study of Sumerian cuneiform, including part-of-speech tagging, named entity recognition, and machine translation.
  • AfriSenti

    AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages
  • UK-PODS-ALIGN

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS-ALIGN is a dataset that features modern conversational...
  • UK-PODS

    This work showcases a cost-effective method for generating training data for speech processing tasks. The dataset UK-PODS features modern conversational Ukrainian language.
  • Ligurian Monolingual Corpus

    The first open source monolingual corpus for Ligurian.
  • Normalized Ligurian Corpus

    A dataset of 4,394 Ligurian sentences in different spelling systems paired with normalized versions.